Closed JudithR closed 1 year ago
Hi Judith,
Thank you for your interest in GX!
The output looks correct, and GX is asserting your genome is clean. The taxonomy matches in Top represented divs:
are those that had the highest aggregate alignment coverage but aren't necessarily included in the final contamination report due to various filtering criteria that is used to ignore lower confidence contaminant assignments. We will modify our reporting to reduce confusion, thanks for pointing this out.
The taxonomy report show four fish assignments that are short spans (lengths are in columns immediately after anml:fishes
) and are repeat-rich (column 3). It would be helpful if you could copy the full set of rows for these scaffolds, not just the contaminant ones. With the information you provided I'm guessing GX throws out these calls because they are short, intra-kingdom spans. Animal-in-animal contamination is only reported if it is of sufficient length and coverage.
Thx for the quick response. And a very reassuring answer. We used blobtools before and found the same off-taxon hits and also concluded that it wasn't likely real contamination. Please find attached the complete output for the 3 flagged scaffolds. possible_contaminants.txt
Hi Judith,
Thanks for sending these. Indeed it looks like what I suspected...the sequences are predominantly called bird with small, repeat-laden spans called fish. We've done a fair amount of testing to come up with criteria to remove lower confidence contamination calls such as these from the final report. You should be good to proceed with your genome as is.
Hi, I used fcs-gx for a bird genome assembly using singularity with version 0.2.3 following the instructions in the wiki. For the test I get similar results to the ones given, the major difference being that fcs-gx now also gives REVIEW suggestions (fcsgx_test.fa.6973.fcs_gx_report.txt).
I ran with $SHM_LOC as file location (in the absence of sudo permissions) .
When I run it with all on my data, I get the following log:
The top represented divs suggest the presence of other taxa than birds in the assembly In the taxonomy.rpt I do get 4 lines labeled as contaminant:
However the summary of the report in the log is empty, as is the report itself:
Is this output correct given the logs and the taxonomy report? If not, any suggestions to where it might have gone wrong?
Thx a lot Judith