liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
287 stars 50 forks source link

Trust4 output file format for VDJ visualization #96

Open akk01 opened 3 years ago

akk01 commented 3 years ago

With programs like Seurat and Scanpy + scirpy, we visualize the gene expression and VDJ data. I am wondering what output files from Trust4 can be used for the same purpose. Do we need to create the a new files with specified format which can be visualized the same way as 10x single cell gene expression and VDJ data. Please advise me.

mourisl commented 3 years ago

You can try tools like scRepertoire compatible with TRUST4's barcode report output file (*_barcode_report.tsv).

We also provide the script at "scripts/trust-barcoderep-to-10X.pl" to convert the barcode report file to 10x VDJ format, and you can use it in other visualization tools.

grst commented 3 years ago

Hi, scirpy developer here.

@akk01, you should be able to load the TRUST4 result files into scirpy by following the instructions in the Creating AnnData objects from other formats section of the data loading tutorial. I agree it could be nice to have a convenience function like for other formats. Feel free to open an issue in the scirpy repository and we can discuss it there.

@mourisl, even better, though, IMO, would be it TRUST4 could add support for output in the AIRR Rearrangement format. This is a standardized file format for VDJ data and already supported by a wide range of analysis tools including scirpy. AIRR support would vastly improve the interoperability between TRUST4 and other tools.

mourisl commented 3 years ago

Thanks for the suggestions! I'll work on a script to convert to the AIRR format.

mourisl commented 3 years ago

TRUST4 now also outputs files in AIRR formats in the new 1.0.6 version (https://github.com/liulab-dfci/TRUST4/releases/tag/v1.0.6).

@akk01 You can pull the new TRUST4's version and run it with the option "--stage 3", which should output the AIRR format files from the existing results.

@grst Thanks for the suggestions!

grst commented 3 years ago

Awesome :)

tsa4002 commented 1 year ago

Hi @mourisl and @grst, thank you for developing these packages. I have a problem loading in TRUST4 output to scIRPY and not sure which program is causing the issue.

Using scIRPY's read_airr to load in the _barcode_airr.tsv file output by Trust4...

airr_anndata = scirpy.io.read_airr("path/to/_barcode_airr.tsv") airr_anndata.obs

...only outputs the log

WARNING: locus column not found in input data. The locus is being inferred from the {v,d,j,c}_call columns.

and then shows cell_ids without any other column

Screen Shot 2023-08-01 at 8 21 31 PM

The _barcode_airr.tsv file isn't empty. I've attached a screenshot of an example line

Screen Shot 2023-08-01 at 8 29 58 PM

I then tried converting the _barcode_report.tsv to 10X VDJ format using trust-barcoderep-to-10X.pl reading it in using scIRPY's read_10x_vdj but I face a similar issue where only cell_ids show up

adata_bcr = scirpy.io.read_10x_vdj(path_bcr_input) adata_bcr.obs

Screen Shot 2023-08-01 at 8 37 37 PM

Would appreciate any direction. Thank you both

grst commented 1 year ago

Hi @tsa4002,

if you are using scirpy >=v0.13, this is expected behavior. There was a major update to the data model: the AIRR data is not stored in .obs anymore, but in .obsm["airr"]. All scirpy functions that need it take it directly from there. If you want to extract certain variables for visualization you can do so with scirpy.get.airr. See this documentation page for more details, and the v0.13 changelog.

Cheers, Gregor

tsa4002 commented 1 year ago

Thanks @grst ! Was following along the sc-best practices book so definitely helps to know for future steps.

grst commented 1 year ago

I see! The authors are aware that the chapter is outdated, but I'll ask them that they add a notice on the top until they find time to update the chapter.

EDIT: see https://github.com/theislab/single-cell-best-practices/pull/228