uclahs-cds / pipeline-call-NonCanonicalPeptide

Nextflow pipeline to call non-canonical peptides as custom databases for proteogenomic analysis
https://automatic-adventure-o4l96o9.pages.github.io/
GNU General Public License v2.0
0 stars 1 forks source link

fasta entry merge + split missing output #60

Closed lydiayliu closed 2 years ago

lydiayliu commented 2 years ago

I want to use fasta entry and do both merge and split using the pipeline so I'm expecting the following files that are missing

yiyangliu@ip-0A125212:/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-11/pipeline-meta-call-NonCanonicalPeptide-0.0.1/ACH-000028/call-NonCanonicalPeptide-1.0.0/ACH-000028/moPepGen-0.6.1/output$ tree
.
├── ACH-000028_variant_peptides_summary.txt
├── decoy
│   ├── ACH-000028_merged_encode_decoy.fasta
│   └── ACH-000028_merged_encode_decoy.fasta.dict
└── encode
    ├── ACH-000028_merged_encode.fasta
    └── ACH-000028_merged_encode.fasta.dict

2 directories, 5 files

Here's the current config

// External config files import. DO NOT MODIFY THESE LINES!
includeConfig "${projectDir}/config/default.config"
includeConfig "${projectDir}/config/methods.config"
includeConfig "${projectDir}/nextflow.config"

params {

    executor = 'slurm'

    // input_csv = "/path/to/input.csv"
    // output_dir = "/path/to/output/dir"
    // exprs_table_csv = null
    // variant_fasta_csv = "/path/to/variant_fasta.csv"
    save_intermediate_files = false
    ucla_cds = true

    partition = 'F72'
    clusterOptions = '--exclusive'
    max_parallel_jobs = 1
    samples_per_job = 10
    call_variant_ncpus = 16
    call_variant_memory_GB = 32.GB

    // meta_work_dir = null
    pipeline_work_dir = '/scratch/'

    // see https://github.com/uclahs-cds/pipeline-call-NonCanonicalPeptide#config
    moPepGen {

        entrypoint = 'fasta'
        filter_fasta = false
        split_fasta = true
        encode_fasta = true
        decoy_fasta = true

        index_dir = '/hot/users/yiyangliu/MoPepGen/Index/GRCh38-EBI-GENCODE34/'

        noncoding_peptides = '/hot/project/algorithm/moPepGen/ref/GRCh38-EBI-GENCODE34/gencode34_default_noncoding_peptides.fa'

        merge_variant_noncoding = 'yes'

        splitFasta {

            order_source = 'Mutation,Fusion,Coding,Noncoding'
            group_source = 'Coding:Mutation,Fusion'
            max_source_groups = 1
            additional_split = 'Noncoding'

        }

        summarizeFasta {

            order_source = 'Mutation,Fusion,Coding,Noncoding,Mutation-Fusion,Mutation-Noncoding,Fusion-Noncoding,Mutation-Fusion-Noncoding'
            ignore_missing_source = true

        }

        decoyFasta {

            decoy_string = 'DECOY_'
            decoy_string_position = 'prefix'

        }

    }

}

// Setup the pipeline config. DO NOT REMOVE THIS LINE!
methods.setup()

Command

outdir=/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/
cd ${outdir}
sbatch --partition F2 --exclusive -J CCLE-test-fasta --mail-type ALL --mail-user YiyangLiu@mednet.ucla.edu -o ${outdir}/log/CCLE-test-fasta.log -e ${outdir}/log/CCLE-test-fasta.err --wrap "nextflow run /hot/users/yiyangliu/project-MissingPeptides-Method/pipelines/pipeline-meta-call-NonCanonicalPeptide/main.nf -c /hot/users/yiyangliu/project-MissingPeptides-Method/src/call-noncanonical/CCLE/meta-call-NonCanonicalPeptide_merge_split.config --input_csv ${outdir}/gvf_csvs/CCLE-test.csv --variant_fasta_csv ${outdir}/variant_fasta_csvs/CCLE-test.csv --exprs_table_csv ${outdir}/exprs_table_csvs/CCLE-test.csv --output_dir ${outdir}/2022-05-11/ --meta_work_dir ${outdir}/work_test"
lydiayliu commented 2 years ago

btw this is what it looks like for fasta entry and the same config except when I have merge_variant_noncoding = 'no'

yiyangliu@ip-0A125212:/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-10/pipeline-meta-call-NonCanonicalPeptide-0.0.1/ACH-000028/call-NonCanonicalPeptide-1.0.0/ACH-000028/moPepGen-0.5.1/output$ tree
.
├── ACH-000028_variant_peptides_summary.txt
├── decoy
│   ├── ACH-000028_Coding_encode_decoy.fasta
│   ├── ACH-000028_Coding_encode_decoy.fasta.dict
│   ├── ACH-000028_Noncoding-additional_encode_decoy.fasta
│   ├── ACH-000028_Noncoding-additional_encode_decoy.fasta.dict
│   ├── ACH-000028_Noncoding_encode_decoy.fasta
│   └── ACH-000028_Noncoding_encode_decoy.fasta.dict
├── encode
│   ├── ACH-000028_Coding_encode.fasta
│   ├── ACH-000028_Coding_encode.fasta.dict
│   ├── ACH-000028_Noncoding-additional_encode.fasta
│   ├── ACH-000028_Noncoding-additional_encode.fasta.dict
│   ├── ACH-000028_Noncoding_encode.fasta
│   └── ACH-000028_Noncoding_encode.fasta.dict
└── split
    ├── ACH-000028_Coding.fasta
    ├── ACH-000028_Noncoding-additional.fasta
    └── ACH-000028_Noncoding.fasta

Also fasta entry with merge_variant_noncoding = 'both' and filtering gives the correct outputs:

yiyangliu@ip-0A125212:/hot/project/algorithm/moPepGen/CCLE/processed/noncanonical-database/call-nonCanonicalPeptide/GRCh38-EBI-GENCODE34/2022-05-30_filter_split/pipeline-NonCanonicalPeptide-0.0.1/ACH-000005/call-NonCanonicalPeptide-1.0.0/ACH-000005/moPepGen-0.6.1/output$ tree
.                                                                                       
├── ACH-000005_merged_peptides_filtered.fasta       
├── ACH-000005_merged_peptides_filtered_summary.txt                                     
├── ACH-000005_noncoding_peptides_filtered.fasta                                        
├── ACH-000005_variant_peptides_filtered.fasta                                          
├── ACH-000005_variant_peptides_filtered_summary.txt                                    
├── ACH-000005_variant_peptides_summary.txt                                             
├── decoy                                                                               
│   ├── ACH-000005_Coding_encode_decoy.fasta                                            
│   ├── ACH-000005_Coding_encode_decoy.fasta.dict                                       
│   ├── ACH-000005_merged.fasta                                                         
│   ├── ACH-000005_merged_peptides_filtered_encode_decoy.fasta                          
│   ├── ACH-000005_merged_peptides_filtered_encode_decoy.fasta.dict                     
│   ├── ACH-000005_Noncoding-additional_encode_decoy.fasta                              
│   ├── ACH-000005_Noncoding-additional_encode_decoy.fasta.dict                         
│   ├── ACH-000005_Noncoding_encode_decoy.fasta                                         
│   └── ACH-000005_Noncoding_encode_decoy.fasta.dict                                    
├── encode                                                                              
│   ├── ACH-000005_Coding_encode.fasta                                                  
│   ├── ACH-000005_Coding_encode.fasta.dict                                             
│   ├── ACH-000005_merged_peptides_filtered_encode.fasta     
│   ├── ACH-000005_merged_peptides_filtered_encode.fasta.dict                           
│   ├── ACH-000005_Noncoding-additional_encode.fasta                                    
│   ├── ACH-000005_Noncoding-additional_encode.fasta.dict                                                                                                                     
|   ├── ACH-000005_Noncoding_encode.fasta                                                                                                                                        
│   └── ACH-000005_Noncoding_encode.fasta.dict                                                                                                                                   
└── split                                                                                                                                                                        
    ├── ACH-000005_Coding.fasta                                                                                                                                                  
    ├── ACH-000005_Noncoding-additional.fasta                                                                                                                                    
    └── ACH-000005_Noncoding.fasta 
zhuchcn commented 2 years ago

Seems like the pipeline logic is a little off for merge. Btw, do we want to implement this logic in the pipeline?

https://crispy-invention-072327aa.pages.github.io/filter-fasta/#complex-filtering

lydiayliu commented 2 years ago

I suppose we can add the complex filtering as a bucket list, I don't really see any urgent use for it yet!

The merge logic is more important for me right now XD