uclahs-cds / pipeline-call-NonCanonicalPeptide

Nextflow pipeline to call non-canonical peptides as custom databases for proteogenomic analysis
https://automatic-adventure-o4l96o9.pages.github.io/
GNU General Public License v2.0
0 stars 1 forks source link

don't output merged peptides if filtering is required #64

Closed lydiayliu closed 1 year ago

lydiayliu commented 2 years ago

I am doing merge_variant_noncoding = 'both' and fasta entry with

        filterFasta {

            variant_peptides {
                skip_lines = 1
                tx_id_col = 1
                quant_col = 2
                quant_cutoff = 0.1
            }
            noncoding_peptides {
                skip_lines = 1
                tx_id_col = 1
                quant_col = 2
                quant_cutoff = 0.1
            }
            merged_peptides {
                skip_lines = 1
                tx_id_col = 1
                quant_col = 2
                quant_cutoff = 0.1
            }

        }

The interesting thing about this set up is that it outputs a set of un-filtered merged_peptides.fasta and a un-filtered variant_peptides_summary.txt, which I don't think is necessary since merged_peptides filtering should be on. The variant_peptides_summary.txt serves as a sanity check (?) with the fasta entry, but is still unnecessary since the original run that produced the fasta would have given the exact same summary table.

See here for a sample with expression table available. The decoy, encode and split folders all look clean.

yiyangliu@ip-0A125212:/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-06-12_1/pipeline-meta-call-NonCa
nonicalPeptide-0.0.1/ACH-001075/call-NonCanonicalPeptide-1.0.0/ACH-001075/moPepGen-0.6.2/output$ ls *                                                                         
ACH-001075_merged_peptides.fasta                 ACH-001075_noncoding_peptides_filtered.fasta      ACH-001075_variant_peptides_summary.txt                                    
ACH-001075_merged_peptides_filtered.fasta        ACH-001075_variant_peptides_filtered.fasta                                                                                   
ACH-001075_merged_peptides_filtered_summary.txt  ACH-001075_variant_peptides_filtered_summary.txt                                                                             

decoy:                                                                                                                                                                        
ACH-001075_Coding_encode_decoy.fasta                    ACH-001075_merged_peptides_filtered_encode_decoy.fasta.dict  ACH-001075_Noncoding_encode_decoy.fasta                  
ACH-001075_Coding_encode_decoy.fasta.dict               ACH-001075_Noncoding-additional_encode_decoy.fasta           ACH-001075_Noncoding_encode_decoy.fasta.dict             
ACH-001075_merged_peptides_filtered_encode_decoy.fasta  ACH-001075_Noncoding-additional_encode_decoy.fasta.dict                                                               

encode:                                                                                                                                                                       
ACH-001075_Coding_encode.fasta                    ACH-001075_merged_peptides_filtered_encode.fasta.dict  ACH-001075_Noncoding_encode.fasta
ACH-001075_Coding_encode.fasta.dict               ACH-001075_Noncoding-additional_encode.fasta           ACH-001075_Noncoding_encode.fasta.dict                               
ACH-001075_merged_peptides_filtered_encode.fasta  ACH-001075_Noncoding-additional_encode.fasta.dict
split:
ACH-001075_Coding.fasta  ACH-001075_Noncoding-additional.fasta  ACH-001075_Noncoding.fasta

But there's a problem with the expression table is NOT available, resulting in:

yiyangliu@ip-0A125212:/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-06-12_01/pipeline-meta-call-NonC
anonicalPeptide-0.0.1/ACH-001039/call-NonCanonicalPeptide-1.0.0/ACH-001039/moPepGen-0.6.2/output$ ls *                                                                        
ACH-001039_merged_peptides.fasta  ACH-001039_variant_peptides_summary.txt                                                                                                     

decoy:                                                                                                                                                                        
ACH-001039_Coding_encode_decoy.fasta           ACH-001039_merged_peptides_encode_decoy.fasta.dict       ACH-001039_Noncoding_encode_decoy.fasta                               
ACH-001039_Coding_encode_decoy.fasta.dict      ACH-001039_Noncoding-additional_encode_decoy.fasta       ACH-001039_Noncoding_encode_decoy.fasta.dict                          
ACH-001039_merged_peptides_encode_decoy.fasta  ACH-001039_Noncoding-additional_encode_decoy.fasta.dict                                                                        

encode:                                                                                                                                                                       
ACH-001039_Coding_encode.fasta       ACH-001039_merged_peptides_encode.fasta       ACH-001039_Noncoding-additional_encode.fasta       ACH-001039_Noncoding_encode.fasta       
ACH-001039_Coding_encode.fasta.dict  ACH-001039_merged_peptides_encode.fasta.dict  ACH-001039_Noncoding-additional_encode.fasta.dict  ACH-001039_Noncoding_encode.fasta.dict  

split:                                                                                                                                                                        
ACH-001039_Coding.fasta  ACH-001039_Noncoding-additional.fasta  ACH-001039_Noncoding.fasta 

Since this sample doesn't have expression table I have to assume that everything in decoy, encode and split are NOT filtered. Which is kind of misleading since the files in those folders are not labelled as "filtered" or "unfiltered". I think for this sample nothing should be outputted?

lydiayliu commented 1 year ago

Resolved in #81