Open erinyoung opened 1 year ago
This is interesting, thank you! I think any contribution would be appreciated. I haven't updated the spec yet where we would host the datasets yet but I will brainstorm more. For now, I think a dataset with accessions and perhaps AMR results would be the most helpful. Let me know!
Don't thank me just yet.
I've attached a file that may be helpful to you.
There are six columns in this file that designate
There are some caveats to this file. This file may contain assemblies or SRA accessions that are not, yet, publicly available. Also, some of these isolates may have their AMR gene on their chromosome as opposed to a plasmid. I wanted to vet these problems first, but I do not think that I'll have the time for that for awhile.
I may come back and edit filter this information in the future, but it's here if it will start being useful.
Hi @erinyoung,
Just came across this issue while I was looking for some more benchmarking datasets for my tool Plassembler which implements a good chunk of what you outline :) It doesn't go to the individual plasmid level though.
It's still a work in progress for now, but just thought I would share. I'm going to implement a "--keep fastqs" flag now I think based on your comments so thanks for that as others may find it useful!
https://github.com/gbouras13/plassembler
George
I would like to contribute to this effort, but I want to make sure that my methods are sound. I would love feedback and insight.
I think I can create a toy dataset for some plasmids containing AMR genes.
Here's my current plan: