marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
644 stars 177 forks source link

how to regenerate the readNames.txt #2299

Closed xychen233 closed 2 months ago

xychen233 commented 3 months ago

Dear Canu team,

I'm assembling a genome with canu 2.2, and it's actually doing quite well. I wanted to see what reads it used, but found that the readNames.txt file was not generated in the seqStore directory. Could you please tell me how to regenerate the readNames.txt? the canu command we used : /hwfssz1/ST_EARTH/Reference/ST_AGRIC/APP/canu-2.2/bin/canu -p test -d test useGrid=remote minReadLength=10000 minOverlapLength=3000 readSamplingBias=1.0 genomeSize=70m corOutCoverage=150 correctedErrorRate=0.12 -nanopore input.fastq.gz

Your reply would be greatly appreciated!

chen

skoren commented 3 months ago

It should still be created, at least I confirmed v2.2 creates in my test run. I think it must have gotten erased at some point during the run (canu doesn't actually use that file). I think your best option is to re-run the store creation script. Unfortunately, the store you have now has extra information (like the read correction/etc) that would be lost if you removed it. So, I suggest modifying the build test/test.seqStore.sh script to build a seq store with a new name which should create a new readNames.txt. Confirm both stores have the same number of reads/bases in uncorrected reads (sqStoreDumpMetaData -stats -S test/test.seqStore and the same for whatever you name your new store) and then copy over the readNames.txt file from the new store to the old.

skoren commented 2 months ago

Idle