snijderlab / stitch

Template-based assembly of proteomics short reads for de novo antibody sequencing and repertoire profiling
MIT License
22 stars 3 forks source link

Folder search #240

Closed MengTingHe2023 closed 10 months ago

MengTingHe2023 commented 11 months ago

Hi Schulte,currently, you I select folders as input and attach extra parameters for Peaks files. If I want to use result files from various different software tools (such as pNovo and Casanovo) placed in a folder as input and also set parameters like CutoffScore, how should I proceed?

douweschulte commented 11 months ago

Personally I have not used the folder option in a long while. There is no way you can set any of the parameters you mentioned for a Folder input. Additionally pNovo and Casanovo files will not be openend if you use the Folder input, because it will assume all .csv files to be Peaks files. I wrote this Folder input before I added support for any of the other file formats. Your best way forward for now would be to list all files separately. If you kepe using it in this way and having the Folder input option extended would be of help to you feel free to reach out again so that we can discuss how exactly it should behave.

MengTingHe2023 commented 11 months ago

Thank you for your response. I would like to use data from multiple enzyme cleavages together as input, which might provide better coverage. I've thought of two possible solutions: one is to merge de novo sequencing data from multiple enzyme cleavages into a single file as input, and the other is to first convert de novo sequencing data from multiple enzyme cleavages into a reads format. Since this would result in the loss of relevant scoring information, we need to confirm whether Stitch performs deletion for reads below the cutoff score. Additionally, PEAKs has a LocalCutoffALC in the stitch parameters, but pNovo and casanovo do not. I'd like to inquire if, for reads below the local cutoff ALC, they are entirely removed. Once these details are clarified, we can apply the cutoff during the conversion to the reads format. Do you think this approach is feasible?

douweschulte commented 11 months ago

First up one thing, you can have multiple input files in a single stitch run, just list them one after another in the input section: (note you can change any parameter you want for each input specification, and concatenate any number of files from any of the formats)

Input ->
    Peaks ->
        Path     : ../datasets/200305_HER_test_04_DENOVO.csv
        Format   : Ab
        Name     : 01
        CutoffALC: 11
        -RawDataDirectory: R:\F1\peng0013\201912
        -XleDisambiguation: True
    <-
    Peaks ->
        Path     : ../datasets/200305_HER_test_05_DENOVO.csv
        Format   : Ab
        Name     : 02
        CutoffALC: 11
        -RawDataDirectory: R:\F1\peng0013\201912
        -XleDisambiguation: True
    <-
    Peaks ->
        Path     : ../datasets/200305_HER_test_06_DENOVO.csv
        Format   : Ab
        Name     : 03
        CutoffALC: 11
        -RawDataDirectory: R:\F1\peng0013\201912
        -XleDisambiguation: True
    <-
<-

So there should be no need for any file concatenation outside of stitch. For any reads below the cutoffALC/Score stitch fully ignores the reads, they will not be shown anywhere, you could call that deletion but your files are not touched in any way. The LocalCutoffALC is used for the peaks format to retrieve smaller patches out of longer sequences that in full do not score well enough to be included, but have smaller stretches of amino acids that all score above the LocalCutoffALC (with a minimal length controlled by MinLengthPatch), these smaller patches will then be used a normal peptides. This is only implemented for Peaks because at the time of implementing we only used Peaks and since a long time we have not touched this option as it did not really seem to help our results, so I did not implement it for any of the other formats.

MengTingHe2023 commented 11 months ago

Thank you very much for your help and guidance. I understand your suggestions, and I will try the methods you mentioned above. Your assistance is greatly appreciated.