Closed grendon closed 3 years ago
I made changes to the config.txt file for MaSuRCA to allow using both pair-end and single-end data, the updated config.txt is:
DATA PE = pe 150 50 PEr1_trim.fastq PEr2_trim.fastq PE = s1 150 50 SEr1_trim.fastq PE = s2 150 50 SEr2_trim.fastq END
PARAMETERS GRAPH_KMER_SIZE = auto USE_LINKING_MATES = 1 LIMIT_JUMP_COVERAGE = 300 CA_PARAMETERS = cgwErrorRate=0.15 KMER_COUNT_THRESHOLD = 1 NUM_THREADS = 16 JF_SIZE = 200000000 SOAP_ASSEMBLY=0 DO_HOMOPOLYMER_TRIM=0 END
Task can use unaligned example data interactively.
May require some scripting.
@kkowalden here is the ticket for tracking; note @yuantianhpc 's comment above with an example config file. I have a few example data sets in /home/classroom/hpcbio/h3a/example_results/bowtie2-assembly/trimmed
. The PE.R1/R2
reads are paired, the orphans
reads are single end unmapped. You can also add in the 'unpR1' and 'unpR2' as SE though these are after the trimming step so may be artifactual.
A separate config file needs to be prepared for each sample as per https://github.com/alekseyzimin/masurca
This needs to be done inside the process itself. I am still trying to figure this part out.