proteomicsunitcrg / qcloud2-pipeline

QCloud2 pipeline
5 stars 1 forks source link

Reprocess data for Elixir #18

Closed rolivella closed 4 years ago

rolivella commented 5 years ago

Count files to process:

find /users/pr/data/orbitrap_velos/Raw/17*/QC2V -name '*.raw' ! -name '*CORRUPTED*' -exec du -s {} \; | sort -n | wc -l 192

find /users/pr/data/orbitrap_velos/Raw/17*/QC2V -name '*.raw' ! -name '*CORRUPTED*' -exec du -hs {} \; | sort -n From 17M to 1.4G

find /users/pr/data/orbitrap_velos/Raw/18*/QC2V -name '*.raw' ! -name '*CORRUPTED*' -exec du -s {} \; | sort -n | wc -l 143

find /users/pr/data/orbitrap_velos/Raw/18*/QC2V -name '*.raw' ! -name '*CORRUPTED*' -exec du -hs {} \; | sort -n From 2.2M to 1.2G

find /users/pr/data/orbitrap_fusion/Raw/17*/QC2F -name '*.raw' ! -name '*CORRUPTED*' -exec du -s {} \; | sort -n | wc -l 123

find /users/pr/data/orbitrap_fusion/Raw/17*/QC2F -name '*.raw' ! -name '*CORRUPTED*' -exec du -hs {} \; | sort -n From 5.2M to 1.8G

find /users/pr/data/orbitrap_fusion/Raw/18*/QC2F -name '*.raw' ! -name '*CORRUPTED*' -exec du -s {} \; | sort -n | wc -l 155

find /users/pr/data/orbitrap_fusion/Raw/18*/QC2F -name '*.raw' ! -name '*CORRUPTED*' -exec du -hs {} \; | sort -n From 126M to 1.6G

2017 2018
orbitrap_velos 192 143
orbitrap_fusion 123 155

Total: 613

rolivella commented 5 years ago

Procedure:

To solve:

rolivella commented 5 years ago

Annotations Velos HeLa 2017 and 2018:

SELECT 
file.idfile as File_ID,#File ID
file.creationdate as Creation_date,#Date of the run
file.filename as Filename,#Name of the file
annotation_code.annotation as Op_annotation,#The annotation made by our operator
NULLIF(feature.rt,0) as RT_sec,#Retention Time, onverts 0 to NULL
NULLIF(feature.mz,0) as MZ,# M/Z
NULLIF(feature.intensity,0) as Peptide_Area,#Peptide Area 
feature.sequence as Peptide_sequence,#Peptide sequence
NULLIF(feature.fwhm,0) as FWHM,#Full width at half maximum 
NULLIF(feature.mass_accuracy,0) as Mass_accuracy_ppm#Mass accuracy in ppm
FROM file
INNER JOIN feature
ON feature.idfile=file.idfile
LEFT JOIN bar
ON bar.creationdate=file.creationdate
LEFT JOIN annotation_code
ON annotation_code.code=bar.annotation
WHERE file.creationdate BETWEEN '2017-01-01 00:00:00' AND '2018-12-31 23:59:59'#Time window
AND file.instrument = "f"#Orbitrap Fusion Lumos
AND openms <> "ERR"#Discard RAW files that were not able to be processed by our pipeline
AND file.type="hela"#only HELA
AND annotation_code.annotation IS NOT NULL;

2017_2018_LUMOS_annotations.zip

Annotations Velos QC02 2017 and 2018:

SELECT 
file.idfile as File_ID,#File ID
file.creationdate as Creation_date,#Date of the run
file.filename as Filename,#Name of the file
annotation_code.annotation as Op_annotation,#The annotation made by our operator
NULLIF(feature.rt,0) as RT_sec,#Retention Time, onverts 0 to NULL
NULLIF(feature.mz,0) as MZ,# M/Z
NULLIF(feature.intensity,0) as Peptide_Area,#Peptide Area 
feature.sequence as Peptide_sequence,#Peptide sequence
NULLIF(feature.fwhm,0) as FWHM,#Full width at half maximum 
NULLIF(feature.mass_accuracy,0) as Mass_accuracy_ppm#Mass accuracy in ppm
FROM file
INNER JOIN feature
ON feature.idfile=file.idfile
LEFT JOIN bar
ON bar.creationdate=file.creationdate
LEFT JOIN annotation_code
ON annotation_code.code=bar.annotation
WHERE file.creationdate BETWEEN '2017-01-01 00:00:00' AND '2018-12-31 23:59:59'#Time window
AND file.instrument = "v"#Orbitrap Velos
AND openms <> "ERR"#Discard RAW files that were not able to be processed by our pipeline
AND file.type="hela"#only HELA
AND annotation_code.annotation IS NOT NULL;

2017_2018_VELOS_annotations.zip

rolivella commented 4 years ago

Types of files (HeLa), approx. size, date range, anything that groups them in some way (QC02)