Closed rob-p closed 7 years ago
Hi Rob, glad that you find TwoPaCo useful. I will take a look shortly and let you know.
Are there any sequences shorther than 31 in the input file?
Hi Ilyan,
There are 11 such sequences in this file. I understand that, clearly, they cannot be represented in the 31-mer dBG. Do I need to remove them before feeding the file to TwoPaCo? I'll check to see if this resolves the graphdump issue.
Thanks, Rob
@ilyaminkin,
Indeed, this resolves the issue, and graphdump
runs to completion now without error. Thanks for pointing this out. I should have checked for such sequences before processing the file. Might I make the suggestion that TwoPaCo emit a warning when it encounters such records, to notify the user that they might cause issues?
Thanks! Rob
@rob-p Not handling those sequences properly is clearly a bug on TwoPaCo's side. Thank you for reporting it. I am not sure about the proper program's behaviour in that case. Probably, ignoring those and showing a warning to the user, as you suggested.
Hi guys,
First of all; awesome work on TwoPaCo --- the method and the software are both fantastically useful, and I'm really excited to start using it for some downstream applications we're working on. I'm running into the following issue. I used TwoPaCo to build the compacted dBG for the human transcriptome (the following command):
As the name suggests, the reference is protein coding human transcripts from gencode v25. This seems to work fine, and I get the following output from TwoPaCo:
Now, I want to convert this output to a GFA format (I tried both GFA1 and 2 and get the same error in each case). I used the following command:
This results in the following error message:
At this point, some output has been generated, but I presume it's not complete because, despite the fact that there are ~96k input transcripts, I only get 35,451 output paths (i.e.,
P
) entries in the resulting GFA file. Any idea what might be causing this issue or how to fix it?Thanks! Rob