Closed zhuchcn closed 1 year ago
Why splitFiltered
but not split_filtered
to match the rest of the formats?
Can you also try with a sample where split_fasta = true; filter_fasta = true
but no expression table is provided? there should be nothing outputted if in split_filtered
. I also don't think there should be anything outputted in split
?
Why splitFiltered but not split_filtered to match the rest of the formats?
Done! Here is what looks like right now with merge_variant_noncoding = both, split_fasta = true, and filter_fasta = true
test/output/test-integration-merge/call-NonCanonicalPeptide-1.0.0/UCLA0001/moPepGen-0.11.3/output/
├── decoy
│ ├── UCLA0001_merged_peptides_filtered_encode_decoy.fasta
│ ├── UCLA0001_merged_peptides_filtered_encode_decoy.fasta.dict
│ ├── UCLA0001_split_circRNA_encode_decoy.fasta
│ ├── UCLA0001_split_circRNA_encode_decoy.fasta.dict
│ ├── UCLA0001_split_circRNA-RNAEditingSite_encode_decoy.fasta
│ ├── UCLA0001_split_circRNA-RNAEditingSite_encode_decoy.fasta.dict
│ ├── UCLA0001_split_Fusion_encode_decoy.fasta
│ ├── UCLA0001_split_Fusion_encode_decoy.fasta.dict
│ ├── UCLA0001_split_Fusion-Noncoding_encode_decoy.fasta
│ ├── UCLA0001_split_Fusion-Noncoding_encode_decoy.fasta.dict
│ ├── UCLA0001_split_gINDEL-circRNA_encode_decoy.fasta
│ ├── UCLA0001_split_gINDEL-circRNA_encode_decoy.fasta.dict
│ ├── UCLA0001_split_gINDEL_encode_decoy.fasta
│ ├── UCLA0001_split_gINDEL_encode_decoy.fasta.dict
│ ├── UCLA0001_split_gSNP_encode_decoy.fasta
│ ├── UCLA0001_split_gSNP_encode_decoy.fasta.dict
│ ├── UCLA0001_split_Noncoding_encode_decoy.fasta
│ ├── UCLA0001_split_Noncoding_encode_decoy.fasta.dict
│ ├── UCLA0001_split_RNAEditingSite_encode_decoy.fasta
│ └── UCLA0001_split_RNAEditingSite_encode_decoy.fasta.dict
├── encode
│ ├── UCLA0001_merged_peptides_filtered_encode.fasta
│ ├── UCLA0001_merged_peptides_filtered_encode.fasta.dict
│ ├── UCLA0001_split_circRNA_encode.fasta
│ ├── UCLA0001_split_circRNA_encode.fasta.dict
│ ├── UCLA0001_split_circRNA-RNAEditingSite_encode.fasta
│ ├── UCLA0001_split_circRNA-RNAEditingSite_encode.fasta.dict
│ ├── UCLA0001_split_Fusion_encode.fasta
│ ├── UCLA0001_split_Fusion_encode.fasta.dict
│ ├── UCLA0001_split_Fusion-Noncoding_encode.fasta
│ ├── UCLA0001_split_Fusion-Noncoding_encode.fasta.dict
│ ├── UCLA0001_split_gINDEL-circRNA_encode.fasta
│ ├── UCLA0001_split_gINDEL-circRNA_encode.fasta.dict
│ ├── UCLA0001_split_gINDEL_encode.fasta
│ ├── UCLA0001_split_gINDEL_encode.fasta.dict
│ ├── UCLA0001_split_gSNP_encode.fasta
│ ├── UCLA0001_split_gSNP_encode.fasta.dict
│ ├── UCLA0001_split_Noncoding_encode.fasta
│ ├── UCLA0001_split_Noncoding_encode.fasta.dict
│ ├── UCLA0001_split_RNAEditingSite_encode.fasta
│ └── UCLA0001_split_RNAEditingSite_encode.fasta.dict
├── split_filtered
│ ├── UCLA0001_split_circRNA.fasta
│ ├── UCLA0001_split_circRNA-RNAEditingSite.fasta
│ ├── UCLA0001_split_Fusion.fasta
│ ├── UCLA0001_split_Fusion-Noncoding.fasta
│ ├── UCLA0001_split_gINDEL-circRNA.fasta
│ ├── UCLA0001_split_gINDEL.fasta
│ ├── UCLA0001_split_gSNP.fasta
│ ├── UCLA0001_split_Noncoding.fasta
│ └── UCLA0001_split_RNAEditingSite.fasta
├── UCLA0001_merged_peptides.fasta
├── UCLA0001_merged_peptides_filtered.fasta
├── UCLA0001_merged_peptides_filtered_summary.txt
├── UCLA0001_merged_peptides_summary.txt
├── UCLA0001_noncoding_peptides_filtered.fasta
├── UCLA0001_variant_peptides.fasta
├── UCLA0001_variant_peptides_filtered.fasta
├── UCLA0001_variant_peptides_filtered_summary.txt
└── UCLA0001_variant_peptides_summary.txt
As discussed, we want:
filter_fasta = TRUE
), in decoy / encode directories have split
and split_filtered
subdirectoriesfilter_fasta = TRUE
)filter_fasta = TRUE
)We want to call encodeFasta and decoyFasta on both the unfiltered and filtered fasta. And we currently allow uses to turn of the encode/decoy functions. Should we get ride of params.encode_fasta
and params.decoy_fasta
? So they will always be called. This just makes the logic simple a little bit.
No for CCLE I specifically turn off encode_fasta
and decoy_fasta
because I literally don't need all the encode and decoy fastas flotting around... they can't be used as input to merge
I think I implemented the way you want. There are too many output files so the complete tree
output probably won't fit here. I put the directory structure for fasta, gvf and parser entrypoint to the file below on the cluster so you can take a look.
/hot/user/czhu/pipeline-call-NoncanonicalPeptide/tree_output.txt
I also changed the parameter from merge_variant_noncoding
to database_processing_modes
which I think is more reasonable.
We are almost there!! Commenting on the output file
For fasta
entry point or process_unfiltered_fasta = FALSE
, we also don't need to output the unfiltered merged.fasta
├── UCLA0001_merged_peptides.fasta
├── UCLA0001_merged_peptides_filtered.fasta
├── UCLA0001_merged_peptides_filtered_summary.txt
├── UCLA0001_noncoding_peptides_filtered.fasta
├── UCLA0001_variant_peptides_filtered.fasta
├── UCLA0001_variant_peptides_filtered_summary.txt
└── variant_summary.txt
Although I suspect that it is easier to output it than not output it... It doesn't hurt but is a bit of waste as a duplicated file.
Otherwise, now noncoding_peptides_filtered.fasta
would only be outputted as part of split right?
In the last commit, I added a tag of 'variant_only' to the summarizeFasta output from the 'plain' workflow. Also updated the 'tree_output.txt'. Let me know what you think!
It is a bit long winded but I think it works.
├── UCLA0001_variant_peptides.fasta
├── UCLA0001_variant_peptides_filtered.fasta
├── UCLA0001_variant_peptides_filtered_summary.txt
├── UCLA0001_variant_peptides_filtered_variant_only_summary.txt
└── UCLA0001_variant_peptides_summary.txt
I don't like how the unfiltered summary is just called _variant_peptides_summary.txt
but it matches with the fasta. Let's just keep it like this!
I don't like how the unfiltered summary is just called _variant_peptides_summary.txt but it matches with the fasta. Let's just keep it like this!
That's what I was thinking, too, so they can match up.
splitFasta
output dir is changed to splitFiltered if 'filterFasta' is used, otherwise still 'filter'.Example output below with
merge_variant_noncoding
set to 'both'.Closes #80
[X] I have read the code review guidelines and the code review best practice on GitHub check-list.
[X] The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)-[brief_description_of_branch].
[X] I have set up or verified the branch protection rule following the github standards before opening this pull request.
[X] I have added my name to the contributors listings in the
metadata.yaml
and themanifest
block in thenextflow.config
as part of this pull request, am listed already, or do not wish to be listed. (This acknowledgement is optional.)[ ] I have added the changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.[ ] I have updated the version number in the
metadata.yaml
andmanifest
block of thenextflow.config
file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)[X] All test cases have passed.
Closes #...