qiime2 / q2-deblur

BSD 3-Clause "New" or "Revised" License
2 stars 22 forks source link

Underscores in sample IDs breaks the pipeline #67

Closed thermokarst closed 5 years ago

thermokarst commented 6 years ago

Bug Description Underscores in Sample IDs are not supported in deblur. This breaks in several ways --- the reference database check is unable to find any hits when there are underscores present in IDs. As well, IDs with underscores appear to be truncated when underscores are present.

Steps to reproduce the behavior

  1. Run denoise-16s with samples with underscores in the IDs.

Expected behavior Deblur should work as advertised.

Screenshots A user reported that mock community samples were producing the following results:

c51fa0f00a4373b1b435ce8f06f977c3ea767884

Note sample HMP_mock_2 has no reads hitting the reference. The user had previously used the same mock community and had had success, so this reference miss was surprising.

I reran the same samples through denoise-16s using underscore-less sample IDs:

screen shot 2018-07-12 at 8 14 59 am

The sample in question now has the expected amount of reads hitting the reference.

Computation Environment

Questions

  1. Perhaps the way to solve this in this plugin is to test sample IDs for underscores and error out when observed. Thoughts?

References

  1. Reference DB issue (forum)
  2. Truncated ID issue (forum)
thermokarst commented 6 years ago

Forum xref