Closed grhogg closed 3 years ago
To Address issue #42
We originally designed this function with one subject per file in mind.
If I understand, the file Tissue_Bioidentity_TCRdist.tsv have multiple subject ids that you want to retain
Once all the tests pass and this PR has been added to master:
you can reinstall with
pip install git+https://github.com/kmayerb/tcrdist3.git@master
Then you can add 'subject' to the new argument:
use_cols : list ['bio_identity', 'productive_frequency', 'templates', 'rearrangement', 'subject'] list of columns to retain from original input file. Add 'subject' if you wish to retain the subject or leave it blank to use filename as before.
still no luck on my end.
pip uninstall tcrdist3
pip install git+https://github.com/kmayerb/tcrdist3.git
temp = pd.read_csv("Tissue_Bioidentity_TCRdist.tsv", sep='\t')
temp = temp.rename(columns={"sample_name": "subject"})
temp.head()
filenameTSV = 'Tissue_Bioidentity_TCRdist.tsv'
with open(filenameTSV,'w') as write_tsv:
write_tsv.write(temp.to_csv(sep='\t', index=False))
from tcrdist.adpt_funcs import import_adaptive_file, adaptive_to_imgt
df = import_adaptive_file(adaptive_filename = "Tissue_Bioidentity_TCRdist.tsv", use_cols = ['bio_identity', 'productive_frequency', 'templates', 'rearrangement', 'subject'])
pd.options.display.max_colwidth = 100
df
Let me know if I'm making some dumb mistake. Thanks!
Well the fact that you successfully included ` use_cols = ['bio_identity', 'productive_frequency', 'templates', 'rearrangement', ‘subject’] Suggests that you reinstalled correctly.
I think the issue might be that you aren’t outputting the file with the column subject!.
Try this, see line 4 with new filename
temp = pd.read_csv("Tissue_Bioidentity_TCRdist.tsv", sep='\t')
temp = temp.rename(columns={"sample_name": "subject"})
temp.head()
temp.to_csv("Tissue_Bioidentity_TCRdist_with_subject.tsv", sep = “\t”, index = False)
then, reload that file not the original.
from tcrdist.adpt_funcs import import_adaptive_file, adaptive_to_imgt
df = import_adaptive_file(adaptive_filename = ("Tissue_Bioidentity_TCRdist_with_subject.tsv", use_cols = ['bio_identity', 'productive_frequency', 'templates', 'rearrangement', 'subject'])
Hmmm, I don't think this is the issue. I had previously just overwritten the original to contain the column header "subject", but creating a separate file named "Tissue_Bioidentity_TCRdist_with_subject.tsv" doesn't seem to solve the problem either. I apologize that I'm getting so caught up on a minor import function.
temp = pd.read_csv("Tissue_Bioidentity_TCRdist.tsv", sep='\t')
temp = temp.rename(columns={"sample_name": "subject"})
temp.head()
temp.to_csv("Tissue_Bioidentity_TCRdist_with_subject.tsv", sep = "\t", index = False)
from tcrdist.adpt_funcs import import_adaptive_file, adaptive_to_imgt
df = import_adaptive_file(adaptive_filename = "Tissue_Bioidentity_TCRdist_with_subject.tsv", use_cols = ['bio_identity', 'productive_frequency', 'templates', 'rearrangement', 'subject'])
pd.options.display.max_colwidth = 100
df
Hello, I am running into an issue with converting an exported Adaptive immunoseq data set, where the Sample Name column (eg 1000-Tissue_TCRB) is dropped upon conversion, and the output Subject column instead only contains the file name.
Any thoughts on how to trouble shoot would be greatly appreciated. Thanks!