Closed erichards52 closed 1 year ago
@erichards52,
I'm not 100% sure if I understand your question. When you say 'names' do you mean sample names such as 20b87673c1224e9db8bdbbe82899309c
in the example you provided (and you want it to be displayed as HG00276
instead)? If that's the case, then you have to change the SM (sample name) tag from the RG (read group) section of the BAM file. Basically, 20b87673c1224e9db8bdbbe82899309c
is the sample name GeT-RM has chosen to assign to HG00276
. Changing the SM tag is simple, but it can be tricky if you are not familiar with BAM manipulation. My advise is to leave the BAM files as is and just manually change sample names after you have generated diplotype calls. At least that's what I did for my publication. See the attached file for mapping sample names: TableS1.xlsx.
Let me know if this is not what you meant.
BTW, HG00276
does have CYP2D6*4/*5
(*5 is gene deletion) so you can be assured that the pipeline ran fine :)
Thank you!
You've been a great help!
This is exactly what I needed :)
Hello,
For reference I am using GeT-RM BAM files directly from the ENA: https://www.ebi.ac.uk/ebisearch/search?query=PRJEB19931&requestFrom=ebi_index&db=allebi
I have the data stored, as an example, as such: /data/bam_files/ERR195/ERR1955341/NA11993.bam
I am calling the pipeline and all other commands building up to it as such:
I have followed the tutorial and no warnings/issues seem to be present, but when viewing the output of
pypgx print-data grch37-CYP2D6-pipeline_get_rm_1/results.zip | head
, I get the following:Could you please tell me how I can get these names to show up in an informative/consistent way?
Thank you.