njbradley / bcr-dist

A scientific library that computes the relative distances between bcr sequences
MIT License
1 stars 1 forks source link

Using with non 10X data #6

Open bcorrie opened 3 years ago

bcorrie commented 3 years ago

Have you used bcr_dist with non 10X data? I see you can load a "BD" file, I am not familiar with this format...

Hoping to be able to map the emerging AIRR Clone format (https://github.com/airr-community/airr-standards/blob/8e07bd75736c3c32e926f08b6042709940794ded/specs/airr-schema.yaml#L3660) into a file that can be loaded by bcr_dist.

bcorrie commented 3 years ago

@wyattmcdonnell I guess a follow on question would be which fields from the 10X filtered_contig_annotations.csv does the program use? It would probably be pretty easy to convert an AIRR Clone format to this format.

njbradley commented 3 years ago

Hello! So far I haven't used it with any data other than 10x or BD data. It might be a little annoying to convert to 10x format, because in the contig_annotations.csv each row is a single chain instead of of a single cell, which I think is what the AIRR format is, but you can totally give it a go. The columns that are used are "barcode", "chain", "cdr3", and then either "cdr1" and "cdr2" or "v_gene". I've been wanting to add more input formats, and maybe change it up so it takes pandas dataframes to make this kind of conversion easier, but unfortunately I haven't had much time now that school has started. Let me know how everything goes!

bcorrie commented 3 years ago

Thanks - will look into this. We plan on using bcr_dist on some 10X data, so we can use that directly, but I am thinking about how to generalize its use on data from other sources, in particular data that comes from paired chain data in the AIRR data commons.

bcorrie commented 3 years ago

Trying to get this installed, and there seem to be quite a few python dependencies that need to be met to use the 10x_test.py code.

Are these listed anywhere. I am installing them one at a time as I get python import errors which is a bit painful 8-(

bcorrie commented 3 years ago

FYI - from a fresh python virtualenv (python 3.8) this is what I needed to load:

pip install matplotlib pip install pandas pip install scikit-learn pip install umap-learn

It now seems to be working on our 10X data...