proteomicsunitcrg / qcloud2-pipeline

QCloud2 pipeline
5 stars 1 forks source link

Run current EBI pipeline for toy dataset #42

Closed rolivella closed 4 years ago

rolivella commented 4 years ago
ID Filename Checksum Annotation
1 180130_Q_QC2F_01_01_100ng e8a53e5831b4991d90eebb5e8c239fb0 None
2 170815_Q_QC2F_01_06_100ng 654c5180ce42dab7b9ee2ade6351c2da None
3 170712_Q_QC2F_01_05_100ng 6b7c25657eed1a640dd61ff6de67b049 None
4 170307_Q_QC2F_01_01_100ng 0c966e2777705e7972f06be05586435b Column changed & Pre-column changed
5 170118_Q_QC2F_01_01_100ng 2d80fd7bb43f3b7c11abe5bee3aa6206 None
6 180814_Q_QC2V_01_01_100ng 1e39d5856e2a837e3ac8c71c879aa3c8 None
7 180529_Q_QC2V_01_01_100ng c143f81073a7250f640c756211b4767d None
8 180306_Q_QC2V_01_02_100ng 55525df97ae6a4d80a8f316985b92642 None
9 170526_Q_QC2V_01_01_100ng c07544d808880c48fcad9a8f915f44bb MS calibration
10 170307_Q_QC2V_01_02_100ng e7d0cbff7baff24b1928fbfe9d1afa8a Column changed & Pre-column changed
11 170109_Q_QC2V_01_01_100ng b12f8a8457763d222c51561e8d2b2bf9 LC maintenance & MS maintenance
12 180409_Q_QC2F_01_01_100ng 6e3a7e470f328fa35d1738e043d52d3d None
13 180725_Q_QC2F_01_02_100ng a668f29b5228fef3cdef216bf16dea9a None
rolivella commented 4 years ago

/users/pr/qcloud/test/elixir_proteomics_QC_current/output

rolivella commented 4 years ago

Submitted to PRIDE:

elixir-submission.px.tar.gz

IMPORTANT: duplicate mzML and ok.mzML

rolivella commented 4 years ago

Also remove "oxibutanol" from variable modifications.

rolivella commented 4 years ago

Test with RAW files from other instruments.

rolivella commented 4 years ago

In summary, modifications a and checks to do:

rolivella commented 4 years ago

Script to generate px file: BulkPRIDESubmission --folder /users/pr/qcloud/test/pride_submission/files /users/pr/qcloud/test/pride_submission/output

rolivella commented 4 years ago
Status: Resolving address of ftp-private.ebi.ac.uk
Status: Connecting to 193.62.194.179:21...
Status: Connection established, waiting for welcome message...
Status: Initializing TLS...
Status: Verifying certificate...
Status: TLS connection established.
Command:    USER roger.olivella@crg.eu
Response:   530 Permission denied.

Current configuration: ebiftp

rolivella commented 4 years ago

Mail to Mathias:

rolivella commented 4 years ago

Pending:

rolivella commented 4 years ago

Test EBI FTP account

How to update Mathias script: https://github.com/proteomicsunitcrg/qcloud2-pipeline/issues/45#issuecomment-600502115

According to Mathias:

1) How to access:

1) Still permission denied. Can you login to the FTP with my credentials? Ah, I think the 'issue' was that there are two different FTPs for EBI's PRIDE, IDK why. Your credentials are for the other server, I think. Please do the following: update the tool, container/pip either is fine, and use the pw I provided. Don't worry if login with filezilla is not working. Try with the tool.

2) New password:

ivZ4PK9k with your user handle and folder CRG_bulk_PX

3) Script use:

No need to implement, you just need to tell the tool which folders. Here an example: BulkPRIDESubmission --folder /files/folder/instrumentX/2018/ --folder /files/folder/instrumentY/2018/ --folder /files/folder/instrumentX/2019/

Conclusion:

1) The FTP user and password is working:

2) Starting info could be stored in a md file?

3) I don't know exactly what to put in the --folder param. If i put the output folder I get:

Collecting metadata from files within the given folders.
WARNING:root:Ignoring these directories: files/QC02_6b7c25657eed1a640dd61ff6de67b049,files/QC02_b12f8a8457763d222c51561e8d2b2bf9
Enter provided ftp password: ********                                                                                                                  
Enter provided ftp folder name: CRG_bulk_PX                                                                                                            
Uploading 0 files...                                                                                                                                   
   0.0% [=====================================================================================================================>]   0/  ? eta [?:??:??] 
Done. Thank you for choosing BulkPRIDESubmission. Have a great day!
ERROR:asyncio:Task was destroyed but it is pending!
task: <Task pending coro=<Renderer.wait_for_cpr_responses.<locals>.wait_for_timeout() done, defined at /users/pr/qcloud/.local/lib/python3.6/site-packages/prompt_toolkit/renderer.py:504> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f3ab24a87c8>()]>>

If I put the folder where the files are I get:

Collecting metadata from files within the given folders.
Traceback (most recent call last):
  File "/users/pr/qcloud/.local/bin/BulkPRIDESubmission", line 11, in <module>
    load_entry_point('BulkPrideSubmission==0.0.1', 'console_scripts', 'BulkPRIDESubmission')()
  File "/users/pr/qcloud/.local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/users/pr/qcloud/.local/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/users/pr/qcloud/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/users/pr/qcloud/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/users/pr/qcloud/.local/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/users/pr/qcloud/.local/lib/python3.6/site-packages/cli/bulk_pride_submission.py", line 318, in start
    logging.warning("Ignoring these unmatched files: {}".format(','.join(unassociated)))
TypeError: sequence item 2: expected str instance, set found

According to Mathias:

The tool is designed to look into each given (--folder) folder for files to submit. It will not look into sub-folders. When there are name-matching mzid and raw files, the tool will register all other name-matching files of other types (such as mzQC), too.

rolivella commented 4 years ago

To update Mathias script:

Code at: https://gitlab.ebi.ac.uk/walzer/bulk-pride-submission

git pull
pip3 install .
Successfully installed BulkPrideSubmission-0.0.1
rolivella commented 4 years ago

Example of metadata:

select bar.creationdate,annotation_code.annotation 
from bar inner join annotation_code 
on bar.annotation=annotation_code.code 
where creationdate between "2017-01-01" and "2017-12-31" 
and instrument="f" 
and type="hela";

Result:

+---------------------+--------------------------------+ | creationdate | annotation | +---------------------+--------------------------------+ | 2017-03-08 20:15:00 | LC and/or MS Troubleshooting | | 2017-03-08 00:02:00 | Column and/or precolumn change | | 2017-04-03 02:06:00 | LC and/or MS service | | 2017-05-29 05:43:00 | Calibration | | 2017-06-30 02:10:00 | Calibration | | 2017-08-31 13:01:00 | New QC aliquote | | 2017-11-24 00:08:00 | LC and/or MS service | | 2017-11-24 00:08:00 | Calibration | | 2017-11-24 00:08:00 | Cleaning | | 2017-11-28 03:34:00 | LC and/or MS Troubleshooting | | 2017-11-29 04:02:00 | LC and/or MS service | | 2017-12-13 23:15:00 | Column and/or precolumn change | | 2017-12-13 23:15:00 | LC and/or MS Troubleshooting | +---------------------+--------------------------------+

rolivella commented 4 years ago
qcloud@nextflow:/users/pr/qcloud/test/pride_submission$ BulkPRIDESubmission --prepared input_data.md --folder files/QC02_b12f8a8457763d222c51561e8d2b2bf9/
No prepared submission settings readable, will overwrite!
You will need your sample processing protocol available in a file named `sample_processing_protocol.md`!
You will need your data processing protocol available in a file named `data_processing_protocol.md`!
You will be prompted for a number of informations about the submission.
Ready? [y/N]: y
Here we go!
Please enter your name: Roger Olivella       
Please enter your email: roger.olivella@crg.eu
Please enter your affiliation (i.e. institution): CRG
Please enter your username for pride login: roger.olivella@crg.eu
Please enter your lab head's name: Eduard Sabidó
Please enter your lab head's email: eduard.sabido@crg.eu
Please enter your lab head's affiliation (i.e. institution): CRG
Please enter your a project title: test 4
Please enter your a concise project description: test 4
Ok? [y/N]: y
Keywords (comma sparated, finalised by enter): test, qcloud
Your keywords test,qcloud
Ok? [y/N]: y
Are all your samples of one organism and one tissue type? [y/N]: y
Enter species type: Homo sapiens (Human)                                                                                                               
Enter tissue type: HeLa cell                                                                                                                           
Enter experiment type: Proteogenomics                                                                                                                  
Collecting metadata from files within the given folders.
['files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.ok.mzML', 'files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.mzid', 'files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.mzQC', 'files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.featureXML', 'files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.qcml']
WARNING:root:Ignoring these unmatched files: files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.ok.mzML,files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.mzid,files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.mzQC,files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.featureXML,files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.qcml
Enter provided ftp password: ********                                                                                                                  
Enter provided ftp folder name: CRG_bulk_PX                                                                                                            
Uploading 0 files...                                                                                                                                   
   0.0% [=====================================================================================================================>]   0/  ? eta [?:??:??]
Done. Thank you for choosing BulkPRIDESubmission. Have a great day!
**ERROR:asyncio:Task was destroyed but it is pending!
task: <Task pending coro=<Renderer.wait_for_cpr_responses.<locals>.wait_for_timeout() done, defined at /users/pr/qcloud/.local/lib/python3.6/site-packages/prompt_toolkit/renderer.py:504> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f00005ab5e8>()]>>**
rolivella commented 4 years ago

I'm trying to understand the issue with the ".ok.". As I mentioned before, the warning I get is this:

WARNING:root:Ignoring these unmatched files: files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.ok.mzML,files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.mzid,files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.featureXML,files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.qcml,files/QC02_b12f8a8457763d222c51561e8d2b2bf9/QC02_b12f8a8457763d222c51561e8d2b2bf9.mzQC But in the folder:

/QC02_b12f8a8457763d222c51561e8d2b2bf9 I already have the file

QC02_b12f8a8457763d222c51561e8d2b2bf9.ok.mzML

which is the one referenced in the mzID, so where's the problem?

rolivella commented 4 years ago

According to Mathias: "lett me explain how I went about the matching. I require the tool to find file base name matching triplets from raw, mzid, and mzml so that I have a direct line of progenitor files. I think, having that is essential. "

However currently I'm not including the raw file in the output, so this could the reason why I can't upload to the FTP?

rolivella commented 4 years ago

Toy dataset completly tested.