sbslee / dokdo

A Python package for microbiome sequencing analysis with QIIME 2
https://dokdo.readthedocs.io
MIT License
42 stars 12 forks source link

KeyError: "['merged'] not in index", after running "dokdo.denoising_stats_plot" #58

Closed Sara-Mashhadi-Nejad closed 8 months ago

Sara-Mashhadi-Nejad commented 8 months ago

Hi Seung-been,

Thanks a lot for your wonderful code.

As you can see below, I could not attach "qza_file" here. "We don’t support that file type. Try again with GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CPUPROFILE, CSV, DMP, DOCX, FODG, FODP, FODS, FODT, GZ, JSON, JSONC, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP."

So, I shared the files ("qza_file", "qzv_file" and "metadata_file") in a Google Drive folder. Here is the link to the folder: https://drive.google.com/drive/folders/1lRKmI_f51IwwHihucV78LvMJcg3G6Kpe

Thanks, Sara

P.S. code, error, explanation, question:

---Code------------------------- qza_file = '/home/sara/S1B54/stats-dada2_16S_S1B54.qza' metadata_file = '/home/sara/S1B54/metadata_sara_R1_Batch1_V3.tsv'

dokdo.denoising_stats_plot( qza_file, metadata_file, 'name_abr_source', figsize=(8, 6) )

plt.tight_layout()

---Error------------------------- KeyError: "['merged'] not in index"

---Explanation----------------------- The columns of my " denosing-stat.qzv" file:

sample-id, input, filtered, percentage of input passed filter, denoised, non-chimeric, percentage of input non-chimeric

---Question----------------------- As you can see my file does not have these two columns: merged, percentage of input merged

sbslee commented 8 months ago

I just requested access to those files! Please accept my request :)

sbslee commented 8 months ago

@Sara-Mashhadi-Nejad,

The problem was caused because the denoising_stats_plot method was not able to handle data from single-end reads. This is because I have only worked with pair-end reads ;) Fortunately, the fix was not that difficult, so I just updated the 1.17.0-dev branch (bdd6b14) and it now supports data from single-end reads as well.

import dokdo
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set()

qza_file = 'stats-dada2_16S_S1B54.qza'
metadata_file = 'metadata_sara_R1_Batch1_V3_modified.tsv'

dokdo.denoising_stats_plot(
    qza_file,
    metadata_file,
    'name_abr_source',
)

plt.tight_layout()
plt.savefig('out.png')

out

You can install the development version of Dokdo:

$ git clone https://github.com/sbslee/dokdo
$ cd dokdo
$ git checkout 1.17.0-dev
$ pip install .

P.S. I also noticed that your metadata file is CSV instead of TSV. It also lacked the required dtype row which specifies the data type of each column. I modified your metadata file and used it to generate above plot. I can't attach the TSV file here, so I will just show the first few lines of the file:

(qiime2-2022.2) sbslee@Seung-beens-MacBook-Air dokdo-test % head -n 5 metadata_sara_R1_Batch1_V3_modified.tsv 
#SampleID   name_abr_source
#q2:types   categorical
001-DERiv-041723    R
002-DERes-041723    RES
003-NAPol-1-041723  R

Hope this helps!