hputnam / Meth_Compare

6 stars 3 forks source link

Provide md5 for trimmed files #33

Closed sr320 closed 4 years ago

sr320 commented 4 years ago

on Meth_Compare_Pipeline.md

shellywanamaker commented 4 years ago

updated Meth_Compare_Pipeline.md with MD5 for trimmed files.

MD5sum.txt files can be found here:

On Gannet:

Rsync from Mox to Gannet log here: https://gannet.fish.washington.edu/metacarcinus/FROGER_meth_compare/20200311/mox2gannet_rsync.log

Or they are still currently on Mox here:

/gscratch/scrubbed/strigg/analyses/20200311/RRBS/md5sum.txt
/gscratch/scrubbed/strigg/analyses/20200311/WGBS_MBD/md5sum.txt
shellywanamaker commented 4 years ago

@sr320 you also commented "WOULD LOVE TO SEE CONCRETE VALIDATION ON TRIMMING ABOVE"

What are you hoping for in terms of validation? Do we need to re-open this issue? https://github.com/hputnam/Meth_Compare/issues/14

sr320 commented 4 years ago

Just that someone third party would need to read Meth_Compare_Pipeline.md and be convinced of both ability to reproduce and that trimming was done properly (eg by visualizing something in the markdown file.)

sr320 commented 4 years ago

Also I would say the reader cannot easily see / verify that the md5s of files you trimmed match genewiz md5s.

shellywanamaker commented 4 years ago

@sr320 I can post examples of fastqc sequence diversity plots before and after trimming in the markdown file. Would that help or is there a better way to show this? Or should the validation go in a separate file?

shellywanamaker commented 4 years ago

for your second point, I'm confused. can we chat?

shellywanamaker commented 4 years ago

this has been completed and the Meth_Compare_Pipeline.md has been updated.

kubu4 commented 4 years ago

Adding this for potential future reference...

@shellytrigg I looked at your code and you can greatly simplify this in the future by using the built-in --check argument of the md5sum program.

Basically, this is how the whole process would go:

# Change to directory with files that need checksums
cd working_dir

# Generate checksums
md5sum check_this_file.fastq.gz > checksums.md5

# Verify checksums at a later date
md5sum --check checksums.md5

md5sum can use a checksum file (which contains a list of files and their corresponding checksums) as a means to verify checksums. The output will be a list of the filenames and an indication of pass/fail.