Closed cjfields closed 3 years ago
Q: Which resources should we use to run the pipeline on biocluster? queue=normal account=h3bionet
@grendon half the cores are open on HPCBio, maybe start it there and see how long the first batch of samples takes (run the report). We can estimate costs from that.
Samples have been divided into these groups: ESN, GWD, LWK, MSL, Yoruba. The first four groups in that list have fewer than 30 samples each. The Yoruba group has 99 samples. The pipeline is being run on each group and results are being arranged into separate folders for each group too.
@grbot we have data ready to go; transfer of the data for the other hackathon participants can be possibly assigned as a task if needed.
@cjfields I see Gloria has an account on Ilifu. The name of the Ilifu GO endpoint is "Ilifu DTN" and she can use the credentials that was send in her welcoming email. Can we ask her to transfer the files to /cbio/projects/012/stream1/hupan/1kg-100-samples/uiuc
please. What is the size?
The size of the transfer depends on what file(s) are needed on your end for analysis.
We analyzed almost 200 samples. The final results for each sample are the megahit assembly files + corresponding metrics. Those files are tiny, only a few KBases.
Intermediary results include unmapped reads pre and post qc-trim. Are those files needed too? They are larger files than the output files, roughly 1/10 of the size of the input file.
We also run multiqc on each group of samples. This file is also small.
Thank you @grendon
Please transfer everything you have because it might be useful in the comparison. The 1000 Genomes samples that we worked with is here . If you have those and can transfer them it would be great.
done
Yoruba files: metadata and data provenance. See attachments.
data_provenance_Yoruba.txt
metadata_Yoruba.txt
A local mirror of the Yoruba CRAM files is available on biocluster at this location:
/home/groups/h3abionet/RefGraph/data/1000genome/Yoruba/