sanger-tol / readmapping

Nextflow DSL2 pipeline to align short and long reads to genome assembly. This workflow is part of the Tree of Life production suite.
https://pipelines.tol.sanger.ac.uk/readmapping
MIT License
11 stars 6 forks source link

merge bwamem_index and remove unneeded multiqc options #102

Closed tkchafin closed 3 months ago

tkchafin commented 3 months ago

PR checklist

tkchafin commented 3 months ago

@reichan1998 Good catch on the -resume issue! I think I have figured it out and fixed the issue. The problem was caused by special characters in the file names for the hic and illumina samples, and the fact that I was loading meta.file as a file-path object (nextflow.file.http.XPath) and not a string. These special characters must be dealt with in some hidden way during caching which was causing the join operation to fail (i.e., the file object created from the original file and the one created from the cached file were not identical). So... after a few hours of banging my head against this, the problem was fixed by simply changing the type of meta.file defines in assets/schema_input.json:

            "datafile": {
                "format": "string",
                "pattern": "^\\S+$",
                "errorMessage": "Data file for reads cannot contain spaces and must have extension 'cram', 'bam', '.fq.gz' or '.fastq.gz'",
                "meta": ["datafile"]
            },

Can you give it a try and see if the latest commits fix the problem?

reichan1998 commented 3 months ago

Thank you for letting me know. It's great to hear that you have solved this problem @tkchafin ^^ I will try and get back to you later.

reichan1998 commented 3 months ago

Hi @tkchafin, I have checked and your solution can fix the-resume issue. Happy weekend!

tkchafin commented 3 months ago

Thanks Weiwen! Can you click the "approve" button in the review so I can merge?

reichan1998 commented 3 months ago

I have already approved. Thank you, Tyler!

tkchafin commented 3 months ago

Thanks for your review!