Open akk01 opened 3 years ago
You can try tools like scRepertoire compatible with TRUST4's barcode report output file (*_barcode_report.tsv).
We also provide the script at "scripts/trust-barcoderep-to-10X.pl" to convert the barcode report file to 10x VDJ format, and you can use it in other visualization tools.
Hi, scirpy developer here.
@akk01, you should be able to load the TRUST4 result files into scirpy by following the instructions in the Creating AnnData objects from other formats section of the data loading tutorial. I agree it could be nice to have a convenience function like for other formats. Feel free to open an issue in the scirpy repository and we can discuss it there.
@mourisl, even better, though, IMO, would be it TRUST4 could add support for output in the AIRR Rearrangement format. This is a standardized file format for VDJ data and already supported by a wide range of analysis tools including scirpy. AIRR support would vastly improve the interoperability between TRUST4 and other tools.
Thanks for the suggestions! I'll work on a script to convert to the AIRR format.
TRUST4 now also outputs files in AIRR formats in the new 1.0.6 version (https://github.com/liulab-dfci/TRUST4/releases/tag/v1.0.6).
@akk01 You can pull the new TRUST4's version and run it with the option "--stage 3", which should output the AIRR format files from the existing results.
@grst Thanks for the suggestions!
Awesome :)
Hi @mourisl and @grst, thank you for developing these packages. I have a problem loading in TRUST4 output to scIRPY and not sure which program is causing the issue.
Using scIRPY's read_airr to load in the _barcode_airr.tsv file output by Trust4...
airr_anndata = scirpy.io.read_airr("path/to/_barcode_airr.tsv")
airr_anndata.obs
...only outputs the log
WARNING:
locus
column not found in input data. The locus is being inferred from the {v,d,j,c}_call columns.
and then shows cell_ids without any other column
The _barcode_airr.tsv file isn't empty. I've attached a screenshot of an example line
I then tried converting the _barcode_report.tsv to 10X VDJ format using trust-barcoderep-to-10X.pl reading it in using scIRPY's read_10x_vdj but I face a similar issue where only cell_ids show up
adata_bcr = scirpy.io.read_10x_vdj(path_bcr_input)
adata_bcr.obs
Would appreciate any direction. Thank you both
Hi @tsa4002,
if you are using scirpy >=v0.13, this is expected behavior. There was a major update to the data model: the AIRR data is not stored in .obs
anymore, but in .obsm["airr"]
.
All scirpy functions that need it take it directly from there. If you want to extract certain variables for visualization you can do so with scirpy.get.airr
. See this documentation page for more details, and the v0.13 changelog.
Cheers, Gregor
Thanks @grst ! Was following along the sc-best practices book so definitely helps to know for future steps.
I see! The authors are aware that the chapter is outdated, but I'll ask them that they add a notice on the top until they find time to update the chapter.
EDIT: see https://github.com/theislab/single-cell-best-practices/pull/228
With programs like Seurat and Scanpy + scirpy, we visualize the gene expression and VDJ data. I am wondering what output files from Trust4 can be used for the same purpose. Do we need to create the a new files with specified format which can be visualized the same way as 10x single cell gene expression and VDJ data. Please advise me.