smdabdoub / kraken-biom

Create BIOM-format tables (http://biom-format.org) from Kraken output (http://ccb.jhu.edu/software/kraken/, https://github.com/DerrickWood/kraken).
MIT License
47 stars 15 forks source link

Add sample_data info to biom file #20

Open Ptero64 opened 2 years ago

Ptero64 commented 2 years ago

Hello, I am using kraken-biom script to convert kraken2 report files to a biom file to run in phyloseq R. I managed to produce a unique biom file from 90 kraken reports, but when after using import_biom form phyloseq package I have a phylose-class object with only otu_table and tax_table, no sample_table.

How can we add the sample_table to the biom files? I tried using also biom add-metadata with a text file with ID and some group info, but it seems to doesn't work.

Thanks in advance for the help

regards

Nicolas

mawa86 commented 2 years ago

i think the metadata file needs to be in TSV format, where the sample ID is the same (match by sample ID). not sure if this helps..?

ayoraind commented 1 year ago

Hi.

I am also facing this challenge. I used the option -m or --metadata to add relevant metadata to generate a biom file having the 'sample_table' together with the otu_table and tax_table. It seems like the script (kraken_biom.py) does not accept my metadata.tsv file. I see this error each time:

KeyError(f"None of [{key}] are in the [{axis_name}]")

I tried to add hash (#) to the header. That didn't work. I tried removing the header. That didn't work either. The id column in my metadata file matches the names of the kraken report files. So, I don't believe that the sample ID is the issue in my case.

@Ptero64, were you able to solve the issue?

Thanks in advance for the help.

Regards,

Ayorinde

MaryoHg commented 1 year ago

Dear @ayoraind,

Hope you're doing OK. I installed kraken-biom two days ago and using the kraken-biom --help option I noticed there is no --metadata option. So I had a similar problem when trying to add metadata to the BIOM table using kraken-biom itself.

However, I add metadada using phyloseq package in R environment. To do that you just need to:

  1. Load you BIOM table: mybiom <- phyloseq::import_biom('/dir/biom_table_wo_metadata.biom')

  2. Create a metadata object with phyloseq with: mapping <- phyloseq::import_qiime_sample_data(mapfilename = 'metadata.tsv')

  3. Merge mybiom and mapping objects into a single phyloseq object to work with: data <- phyloseq::merge_phyloseq(data_biom, mapping)

  4. Rename the taxonomic level for easier handling: colnames(tax_table(data)) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Specie")

Hope this help anyone out there.

Cheers, Maryo.

ayoraind commented 1 year ago

Many thanks, @MaryoHg. It helps!