harvardinformatics / TranscriptomeAssemblyTools

A collection of scripts for processing fastq files in ways to improve de novo transcriptome assemblies, and for evaluating those assemblies.
47 stars 24 forks source link

FilterUncorrectabledPEfastq.py Run and results #12

Closed SergeyBaikal closed 7 months ago

SergeyBaikal commented 8 months ago

Dear developers!

Could you clarify please where I can find the description of run and output files? I found the comand in one of the answers python3 FilterUncorrectabledPEfastq.py -1 Sample_R1_val_1.cor.fq.gz -2 Sample_R2_val_2.cor.fq.gz

and I get the files unfixrm_Sample_R1_val_1.cor.fastq.gz and unfixrm_Sample_R2_val_2.cor.fastq.gz Do they contain clean reads?

And my log file rmunfixable_None.log total PE reads:183086103 removed PE reads:29794547 retained PE reads:153291556 R1 corrected:59087923 R2 corrected:82233503 pairs corrected:100203512 R1 unfixable:10904169 R2 unfixable:13355522 both reads unfixable:5534856

adamfreedman commented 8 months ago

and I get the files unfixrm_Sample_R1_val_1.cor.fastq.gz and unfixrm_Sample_R2_val_2.cor.fastq.gz Do they contain clean reads?

yes ... those files contain only reads that weren't corrected (i.e. that had no detectable errors), and those that were corrected. the ones with unfixable errors were removed.

the log file is fairly self explanatory is it not? it details how many pairs were removed by the script, how many were kept, and tells you how many read pairs , left and right reads were corrected, how many were not fixable, etc. How many pairs were kept is important because for some analyses you want a minimal number of read pairs. How many were unfixable gives you an idea of whether there was a lot of low complexity junk in your sequencing library.

Adam H. Freedman, PhD Data Scientist Faculty of Arts & Sciences Informatics Group Harvard University 38 Oxford St Cambridge, MA 02138 phone: +001 310 415 7145


From: Sergey Potapov @.> Sent: Friday, January 19, 2024 11:22 AM To: harvardinformatics/TranscriptomeAssemblyTools @.> Cc: Subscribed @.***> Subject: [harvardinformatics/TranscriptomeAssemblyTools] FilterUncorrectabledPEfastq.py Run and results (Issue #12)

Dear developers!

Could you clarify please where I can find the description of run and output files? I found the comand in one of the answers python3 FilterUncorrectabledPEfastq.py -1 Sample_R1_val_1.cor.fq.gz -2 Sample_R2_val_2.cor.fq.gz

and I get the files unfixrm_Sample_R1_val_1.cor.fastq.gz and unfixrm_Sample_R2_val_2.cor.fastq.gz Do they contain clean reads?

And my log file rmunfixable_None.log total PE reads:183086103 removed PE reads:29794547 retained PE reads:153291556 R1 corrected:59087923 R2 corrected:82233503 pairs corrected:100203512 R1 unfixable:10904169 R2 unfixable:13355522 both reads unfixable:5534856

— Reply to this email directly, view it on GitHubhttps://github.com/harvardinformatics/TranscriptomeAssemblyTools/issues/12, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADBMMUBE4O7U5FI62UMKID3YPKMWBAVCNFSM6AAAAABCCGLPRGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4TAOBXGUZTMMY. You are receiving this because you are subscribed to this thread.Message ID: @.***>