vari-bbc / demultiplex

GNU General Public License v3.0
0 stars 1 forks source link

overrepresented barcodes by lane #11

Closed VAIgenomics closed 3 years ago

VAIgenomics commented 3 years ago

I was curious if we could add this feature to the bcl2fastq_snake.sh, typically with the bcl2fast script text files are generated in the diagnostic files folder like:

Barcodes_fraction_greater_than_0.0005_L001

These are really helpful when we are trying to identify index mix ups or issues. I noticed the pipeline doesn't currently do this. It is not a high priority issue, but would be helpful in the future.

genomics-kl commented 3 years ago

@VAIgenomics , this seems to be caused one of the recent changes to the pipeline. This functionality was never removed intentionally. Looking into it now.

genomics-kl commented 3 years ago

@heyyyjude , the issue seems to be due to the absence of the 'Data/Intensities/BaseCalls/Undetermined_L000_R1/2_001.fastq.gz' file. This file is typically produced by rule 'cat_fqs_undetermined'. I think that because FastQC and Fastq_Screen are no longer run on these files, rule 'cat_fqs_undetermined' is not run at all. We can fix this by putting the file as input for rule 'all'.

After that, you can run the pipeline with the most recent commit and, separately, run it using a commit before the recent changes to make sure that no other files are removed except the FastQC and Fastq_Screen outputs for the Undetermined fastqs.

joowkim commented 3 years ago

@genomics-kl Thank you for your explnation. I will work on it.

genomics-kl commented 3 years ago

@heyyyjude Part of the problem is that some of the code in 'bcl2fastq.sh' were not migrated to Snakemake but simply copied to 'bcl2fastq_snake.sh'. The missing file in question is one of the outputs of that 'legacy' code.

It's up to you if you want to convert those to Snakemake. It may make it easier to maintain.

joowkim commented 3 years ago

@genomics-kl Hi Kin. I did what you had explained and I think I had the right outputs. I just confirmed that the cat_fq_undetermined was run and the below screenshot shows the outputs from each pipeline. (Sorry it is vague on the github)

The left screen is from the previous pipeline which runs multiqc for undetermined fastq files. THe middle screen is from the current pipeline which doesn't run multiqc for undertermined fastq files. THe right screen is from the updated pipeline which doesn't run multiqc for undertermined fastq files but runs the cat_fqs_undetermined rule. I tested this with a iseq run.

I will commit this change to skip_fastqc_multiqc_for_undermined branch. Please let me know if there is anything you would like to comment on. I will start to work on this Barcodes_fraction file. Thank you again for explnation.

Screen Shot 2021-06-25 at 12 16 20 PM
genomics-kl commented 3 years ago

@heyyyjude , did you check the Diagnostics folder for the pipeline in your right-most screenshot? It is possible the 'Barcodes_fraction' file is already there.

joowkim commented 3 years ago

@genomics-kl , Yes, you are right. There is that file in the Diagnostics directory. Also, there are symlinks for multiqc html files.

Screen Shot 2021-06-25 at 1 38 32 PM

genomics-kl commented 3 years ago

Ok, great. You can just take the lead from now on and communicate with @VAIgenomics to make sure that everything is how they want it for this and future issues. I'll be available if you want to discuss anything but, in general. I'll let you decide when to merge branches to the main branch etc. My only recommendation is to compare the output files before and after changes/commits to make sure there are no unexpected changes.

joowkim commented 3 years ago

Hello @VAIgenomics I have updated the pipeline and tested it with a iseq run. Sorry for the inconvience. The fixed one is available as the below command.

git clone https://github.com/vari-bbc/demultiplex.git -b skip_fastqc_multiqc_for_undetermined

and if you would like to run the fixed one then qsub -q genomics demultiplex/bcl2fastq_snake.sh

Please let me know if you have any errors or trouble using it. Thank you. I will merge this one into the main branch when there's no issues or errors found!

joowkim commented 3 years ago

Hi @VAIgenomics ,

Hope all of you had a great weekend. I just would like to know if you have any issues or comments on the updated demux. If you don't have any issues, I would like to merge this updated demux into the master one - so you can just use the master one.

Thanks

VAIgenomics commented 3 years ago

Hi @heyyyjude ,

I'm sorry but I haven't had any time to test the new pipeline yet. Marc hasn't had time to test it out either. I have 2 S4 runs to demultiplex this week so it may take a little extra time for us to test the updates!

Becca

joowkim commented 3 years ago

@VAIgenomics

I updated the master to generate a barcode fraction file and md5sum file. Let me if you have any comments or issues. I will close this issue.