masurca - generating config file on the fly

h3abionet / HPCBio-Refgraph_pipeline

0 stars 6 forks source link

masurca - generating config file on the fly #6

Closed grendon closed 3 years ago

grendon commented 4 years ago

A separate config file needs to be prepared for each sample as per https://github.com/alekseyzimin/masurca

This needs to be done inside the process itself. I am still trying to figure this part out.

yuantianhpc commented 3 years ago

I made changes to the config.txt file for MaSuRCA to allow using both pair-end and single-end data, the updated config.txt is:

DATA PE = pe 150 50 PEr1_trim.fastq PEr2_trim.fastq PE = s1 150 50 SEr1_trim.fastq PE = s2 150 50 SEr2_trim.fastq END

PARAMETERS GRAPH_KMER_SIZE = auto USE_LINKING_MATES = 1 LIMIT_JUMP_COVERAGE = 300 CA_PARAMETERS = cgwErrorRate=0.15 KMER_COUNT_THRESHOLD = 1 NUM_THREADS = 16 JF_SIZE = 200000000 SOAP_ASSEMBLY=0 DO_HOMOPOLYMER_TRIM=0 END

cjfields commented 3 years ago

Task can use unaligned example data interactively.

cjfields commented 3 years ago

May require some scripting.

cjfields commented 3 years ago

@kkowalden here is the ticket for tracking; note @yuantianhpc 's comment above with an example config file. I have a few example data sets in /home/classroom/hpcbio/h3a/example_results/bowtie2-assembly/trimmed. The PE.R1/R2 reads are paired, the orphans reads are single end unmapped. You can also add in the 'unpR1' and 'unpR2' as SE though these are after the trimming step so may be artifactual.

kkowalden commented 3 years ago

Code has been added on my fork to generate masurca config file on the fly and to run masurca.
Assumes 2 fastq files that are from paired-end reads, 2 single-end read fastq files, and 1 orphan read fastq file that come out of current bwa unaligned read extraction steps.
Masurca runs without final gap-closing step as there are not enough extracted reads for that step to run successfully.