rnabioco / djvdj

An R package to analyze single-cell V(D)J data
https://rnabioco.github.io/djvdj
Other
24 stars 4 forks source link

Malformed input data, multiple clonotype_ids are associated with the same cell barcode #130

Closed Ahmedalaraby20 closed 1 year ago

Ahmedalaraby20 commented 1 year ago

Hey, Is there any way for me to filter cells that have multiple clonotypes? cause otherwise the impotvdj does not work and I can't analysis my data.

sheridar commented 1 year ago

Hmm I don't think I've encountered this issue before, I think the best way to handle this is to automatically remove these cells and print a warning. This should be easy to fix, @Ahmedalaraby20 would you be okay with sending me your filtered_contig_annotations.csv file from the cellranger output directory

sheridar commented 1 year ago

This is fixed in 22fdbfb32137b76a38831ed1b86fa2987ac982d8. Before merging, I would like to check more on why you have a cell barcode with multiple clonotype IDs from cellranger (and double check that this is in fact the issue with your file)

Ahmedalaraby20 commented 1 year ago

Hey Rayn, This is the file that I have from cellranger, I tried it the old way read.csv(...) and it worked fine filtered_contig_annotations.csv

sheridar commented 1 year ago

Hey Ahmed, just took a quick look at your file. The issue is that the string, "No", is included in your file to indicate a missing value (e.g. some low quality contigs will not receive a clonotype_id and so this will be left blank)

Did you generate this file with cellranger vdj, and what version of cellranger did you use? Was this file modified by hand after running cellranger? Usually when a cell is missing a value, it is recorded by cellranger as "NA" or left blank (however, I haven't tested the most recent version of cellranger, v7.1). The presence of the string "No" as a clonotype_id is what caused the error you saw.

In summary, the error message you received is accurate, since your file does not conform to the expected format (i.e. the expectation is that empty values are recorded as "NA" or left blank). To analyze your data, I would recommend using the original filtered_contig_annotations.csv file generated by cellranger that conforms to this format.

Ahmedalaraby20 commented 1 year ago

The file was generated via cellranger count and I replaced NAs with "No", I removed the cells with no in the clonotype_id column and it worked fine thanks a lot.

sheridar commented 1 year ago

Glad this is resolved, in the future I would avoid modifying the filtered_contig_annotations.csv file, since this isn't necessary for using djvdj. I'm closing this, feel free to open another issue if you run into any other problems, good luck!