lh3 / miniasm

Ultrafast de novo assembly for long noisy reads (though having no consensus step)
MIT License
297 stars 68 forks source link

GFA with '*' sequences #61

Open sage-wright opened 5 years ago

sage-wright commented 5 years ago

I have a similar but perhaps different issue to Issue #45 (No sequences in gfa - unable to convert to fasta). I have used the -f option (miniasm -f reads.fq overlaps.paf) but there are no sequences in the .gfa file. Where the sequences would be, I only find a '*' placeholder. The log file does not present any errors or issues, and everything seems to be running fine.

Every S line in the .gfa has the following format:

S utg000001l * LN:i:518592

Which is then followed by many lines starting with 'a'.

The .gfa format specification indicates that the sequence can be found in a linked fasta file whenever there is an asterick placeholder. However, I'm not sure what that sequence actually is, or where to find this linked fasta file.

I tried the most current version of miniasm (as of this date) and version 0.2-r128, and both produced an asterisk for the sequence position. Previous experience with miniasm has not led to this outcome.

There were no issues with the generation of the .paf file.

Do you have any suggestions as to how to resolve this? Could this be related to the large size of the reads (many >40,000 bp)?

Thank you for your assistance!

palfalvi commented 5 years ago

Hey!

I am experiencing the same issue, gfa file contains only * instead of sequence. I have ~31x coverage ONT data on a 1 Gb genome. .paf was generated without a problem. I am using miniasm v0.3-r179. Is there any solution on this problem? Thanks in advance!

guangtugao commented 5 years ago

I had the same issue with -c3, but got the sequences with -c2. I saw the discussion from Dr. Heng Li on the program mistakenly using 31-bit integers somewhere in #36. Looks like this is a similar problem. Hope the bug being fixed soon! Thanks!