rnabioco / djvdj

An R package to analyze single-cell V(D)J data
https://rnabioco.github.io/djvdj
Other
23 stars 4 forks source link

i am trying to import vdj data "filtered_contig_annotations.csv" in seurat object but it keep giving me error #139

Closed rimanpreetkaur closed 7 months ago

rimanpreetkaur commented 8 months ago

i am trying to analyse publically available data, where they give .rds file and "filtered_contig_annotations.csv" file, it is showing this error, i have modified the rownames in seurat object because, row names in seurat object begin with 1.1blood so i modified it by adding D like D_1.1_blood but it keep giving me this error, seurat_vdj <- seurat.integrated |>

could you please resolve this

Ahmedalaraby20 commented 8 months ago

Hey, I am not the developer - you can check the name of cells in the filtered_contig_annotations.csv, it might not be the same as the cell name in your Seurat object.

rimanpreetkaur commented 8 months ago

hi, thanks for a prompt reply, yes i do check both of them have same names, as seurat object row names begin with 1_AAA and when i run this code "vdjdir<- c( 1 = file.path(data_dir,"ch1"), 2 = file.path(data_dir,"ch2"), 3 = file.path(data_dir,"ch3"), 4 = file.path(data_dir,"ch4"), 5 = file.path(data_dir,"ch5"), 6 = file.path(data_dir,"ch6"), 7 = file.path(data_dir,"ch7"), 8 = file.path(data_dir,"ch8"))

seurat_harmony_vdj <- exp2|> import_vdj(vdj_dir= vdjdir)" it gave me "vdjdir<- c(

Loading V(D)J data [5.3s] Error in .prepare_meta(): ! meta.data does not contain the same cells as the object, check your cell barcodes Run rlang::last_trace() to see where the error occurred. ✖ Formatting V(D)J data [19s]

 

| >

"

sheridar commented 8 months ago

Hi, the error message "The number of provided cell prefixes does not match the number of unique prefixes present on barcodes", suggests that the number of VDJ samples you are trying to add to the object does not match the number of samples already present in your object.

I would double check that you actually have 8 samples in the object.

I would also try loading the VDJ samples without providing cell prefixes, just make sure the VDJ paths in vdjdir are in the same order as the corresponding GEX samples in the object. By default import_vdj() will automatically detect the cell prefixes that are in the object and try to match these for each VDJ sample.

Try loading using these paths:

vdjdir<- c(
  file.path(data_dir,"ch1"),
  file.path(data_dir,"ch2"),
  file.path(data_dir,"ch3"),
  file.path(data_dir,"ch4"),
  file.path(data_dir,"ch5"),
  file.path(data_dir,"ch6"),
  file.path(data_dir,"ch7"),
  file.path(data_dir,"ch8")
)
sheridar commented 8 months ago

Also, you can't modify the cell barcodes in the object by just modifying the barcodes in the meta.data. This is because the cell barcodes are also stored in other places in the object. Your second error is because when you changed the cell barcodes for the meta.data they no longer matched the cell barcodes stored in other places in the object.

sheridar commented 8 months ago

Hi @rimanpreetkaur , I'm a little confused why you are getting this error, can you send your .rds file and the filtered_contig_annotations.csv file(s). I will take a look at the cell prefixes to see why you're having issues.

Best, Ryan

rimanpreetkaur commented 8 months ago

yes sure, I am trying to analyze publicly available data (GSE194187) ( https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE194187), they shared raw files (.h5 and filtered_contig.csv), I made seurat object from .h5 files then tried to load VDJ data using djvdj library, but it is not working. I can create a seurat object and share it with you if you want.

Thanks & Regards, Dr Rimanpreet Kaur Postdoctoral associate Stony Brook University, New York Phone- +1 929-606-8983 *Email- @. @.>*

On Thu, Nov 9, 2023 at 2:47 PM Ryan Sheridan @.***> wrote:

Hi @rimanpreetkaur https://github.com/rimanpreetkaur , I'm a little confused why you are getting this error, can you send your .rds file and the filtered_contig_annotations.csv file(s). I will take a look at the cell prefixes to see why you're having issues.

Best, Ryan

— Reply to this email directly, view it on GitHub https://github.com/rnabioco/djvdj/issues/139#issuecomment-1804535300, or unsubscribe https://github.com/notifications/unsubscribe-auth/AV4U22VEECFCM7TRGAAKFOTYDUXODAVCNFSM6AAAAAA7BVO4KOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBUGUZTKMZQGA . You are receiving this because you were mentioned.Message ID: @.***>

sheridar commented 7 months ago

Hi @rimanpreetkaur,

The issue you encountered is because some of the VDJ files that were uploaded to GEO include multiple samples per file with different cell barcode suffixes. Normally import_vdj() expects a single sample per file. This will occur if they processed some of the samples separately using cellranger aggr.

import_vdj() can load output files from cellranger aggr (using the aggr_dir argument), but cannot easily load a mix of files where some were processed with aggr and others not.

To load these files into your Seurat object, you can split each file based on the cell barcode suffix (-1 or -2) and write new files. Here is an Rmd you can use to create a Seurat object and add the VDJ data: rimanpreetkaur.zip

This is not an ideal solution, so my plan is to update import_vdj() to allow users to easily upload a mix of files (some processed with aggr and others with count). Thank you for filing this issue let us know if you encounter any other issues

rimanpreetkaur commented 7 months ago

Thank you, I tired but it still gave me same error, below:

so <- so %>%+ import_vdj(vdj_dir = new_files)Error in import_vdj():! The provided cell prefixes (blood1, blood1, liver1, liver1, blood2.A, blood2.B, blood2, liver2.A, liver2.B, liver2, blood3.A, blood3.B, blood3, liver3.A, liver3.B, liver3, liver4, and liver4) do not match those in the input object (blood1, liver1, blood2.A, blood2.B, liver2.A, liver2.B, blood3.A, blood3.B, liver3.A, liver3.B, and liver4_) Backtrace: 1. so %>% import_vdj(vdj_dir = new_files) 2. djvdj::import_vdj(., vdj_dir = new_files)✖ Loading V(D)J data [1.7s]

Thanks & Regards, Dr Rimanpreet Kaur Postdoctoral associate Stony Brook University, New York Phone- +1 929-606-8983 *Email- @. @.>*

On Thu, Dec 14, 2023 at 5:14 PM Ryan Sheridan @.***> wrote:

Hi @rimanpreetkaur https://github.com/rimanpreetkaur,

The issue you encountered is because some of the VDJ files that were uploaded to GEO include multiple samples per file with different cell barcode suffixes. Normally import_vdj() expects a single sample per file. This will occur if they processed some of the samples separately using cellranger aggr.

import_vdj() can load output files from cellranger aggr (using the aggr_dir argument), but cannot easily load a mix of files where some were processed with aggr and others not.

To load these files into your Seurat object, you can split each file based on the cell barcode suffix (-1 or -2) and write new files. Here is an Rmd you can use to create a Seurat object and add the VDJ data: rimanpreetkaur.zip https://github.com/rnabioco/djvdj/files/13678850/rimanpreetkaur.zip

This is not an ideal solution, so my plan is to update import_vdj() to allow users to easily upload a mix of files (some processed with aggr and others with count). Thank you for filing this issue let us know if you encounter any other issues

— Reply to this email directly, view it on GitHub https://github.com/rnabioco/djvdj/issues/139#issuecomment-1856774016, or unsubscribe https://github.com/notifications/unsubscribe-auth/AV4U22TDMHJOZ7M5Z7ZKYS3YJN23PAVCNFSM6AAAAAA7BVO4KOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJWG43TIMBRGY . You are receiving this because you were mentioned.Message ID: @.***>

rimanpreetkaur commented 7 months ago

I successfully uploaded the Blood files, but the liver files are still giving me an error, I tried all changes like changing the barcode (-1, -2) in the .csv file, but it is still not working. if it worked it only uploaded 20-30 entries. I manually check the barcode between the Seurat object and the .csv file, and both are the same. please help me to resolve it.

Thanks & Regards, Dr Rimanpreet Kaur Postdoctoral associate Stony Brook University, New York Phone- +1 929-606-8983 *Email- @. @.>*

On Sun, Dec 17, 2023 at 6:49 PM Rimanpreet kaur @.***> wrote:

Thank you, I tired but it still gave me same error, below:

so <- so %>%+ import_vdj(vdj_dir = new_files)Error in import_vdj():! The provided cell prefixes (blood1, blood1, liver1, liver1, blood2.A, blood2.B, blood2, liver2.A, liver2.B, liver2, blood3.A, blood3.B, blood3, liver3.A, liver3.B, liver3, liver4, and liver4) do not match those in the input object (blood1, liver1, blood2.A, blood2.B, liver2.A, liver2.B, blood3.A, blood3.B, liver3.A, liver3.B, and liver4_) Backtrace: 1. so %>% import_vdj(vdj_dir = new_files) 2. djvdj::import_vdj(., vdj_dir = new_files)✖ Loading V(D)J data [1.7s]

Thanks & Regards, Dr Rimanpreet Kaur Postdoctoral associate Stony Brook University, New York Phone- +1 929-606-8983 *Email- @. @.>*

On Thu, Dec 14, 2023 at 5:14 PM Ryan Sheridan @.***> wrote:

Hi @rimanpreetkaur https://github.com/rimanpreetkaur,

The issue you encountered is because some of the VDJ files that were uploaded to GEO include multiple samples per file with different cell barcode suffixes. Normally import_vdj() expects a single sample per file. This will occur if they processed some of the samples separately using cellranger aggr.

import_vdj() can load output files from cellranger aggr (using the aggr_dir argument), but cannot easily load a mix of files where some were processed with aggr and others not.

To load these files into your Seurat object, you can split each file based on the cell barcode suffix (-1 or -2) and write new files. Here is an Rmd you can use to create a Seurat object and add the VDJ data: rimanpreetkaur.zip https://github.com/rnabioco/djvdj/files/13678850/rimanpreetkaur.zip

This is not an ideal solution, so my plan is to update import_vdj() to allow users to easily upload a mix of files (some processed with aggr and others with count). Thank you for filing this issue let us know if you encounter any other issues

— Reply to this email directly, view it on GitHub https://github.com/rnabioco/djvdj/issues/139#issuecomment-1856774016, or unsubscribe https://github.com/notifications/unsubscribe-auth/AV4U22TDMHJOZ7M5Z7ZKYS3YJN23PAVCNFSM6AAAAAA7BVO4KOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJWG43TIMBRGY . You are receiving this because you were mentioned.Message ID: @.***>

rimanpreetkaur commented 7 months ago

Hi, It worked, I was using the wrong file before a d thanks for your input.

Thanks & Regards, Dr Rimanpreet Kaur Postdoctoral associate Stony Brook University, New York Phone- +1 929-606-8983 *Email- @. @.>*

On Sun, Dec 17, 2023 at 8:11 PM Rimanpreet kaur @.***> wrote:

I successfully uploaded the Blood files, but the liver files are still giving me an error, I tried all changes like changing the barcode (-1, -2) in the .csv file, but it is still not working. if it worked it only uploaded 20-30 entries. I manually check the barcode between the Seurat object and the .csv file, and both are the same. please help me to resolve it.

Thanks & Regards, Dr Rimanpreet Kaur Postdoctoral associate Stony Brook University, New York Phone- +1 929-606-8983 *Email- @. @.>*

On Sun, Dec 17, 2023 at 6:49 PM Rimanpreet kaur @.***> wrote:

Thank you, I tired but it still gave me same error, below:

so <- so %>%+ import_vdj(vdj_dir = new_files)Error in import_vdj():! The provided cell prefixes (blood1, blood1, liver1, liver1, blood2.A, blood2.B, blood2, liver2.A, liver2.B, liver2, blood3.A, blood3.B, blood3, liver3.A, liver3.B, liver3, liver4, and liver4) do not match those in the input object (blood1, liver1, blood2.A, blood2.B, liver2.A, liver2.B, blood3.A, blood3.B, liver3.A, liver3.B, and liver4_) Backtrace: 1. so %>% import_vdj(vdj_dir = new_files) 2. djvdj::import_vdj(., vdj_dir = new_files)✖ Loading V(D)J data [1.7s]

Thanks & Regards, Dr Rimanpreet Kaur Postdoctoral associate Stony Brook University, New York Phone- +1 929-606-8983 *Email- @. @.>*

On Thu, Dec 14, 2023 at 5:14 PM Ryan Sheridan @.***> wrote:

Hi @rimanpreetkaur https://github.com/rimanpreetkaur,

The issue you encountered is because some of the VDJ files that were uploaded to GEO include multiple samples per file with different cell barcode suffixes. Normally import_vdj() expects a single sample per file. This will occur if they processed some of the samples separately using cellranger aggr.

import_vdj() can load output files from cellranger aggr (using the aggr_dir argument), but cannot easily load a mix of files where some were processed with aggr and others not.

To load these files into your Seurat object, you can split each file based on the cell barcode suffix (-1 or -2) and write new files. Here is an Rmd you can use to create a Seurat object and add the VDJ data: rimanpreetkaur.zip https://github.com/rnabioco/djvdj/files/13678850/rimanpreetkaur.zip

This is not an ideal solution, so my plan is to update import_vdj() to allow users to easily upload a mix of files (some processed with aggr and others with count). Thank you for filing this issue let us know if you encounter any other issues

— Reply to this email directly, view it on GitHub https://github.com/rnabioco/djvdj/issues/139#issuecomment-1856774016, or unsubscribe https://github.com/notifications/unsubscribe-auth/AV4U22TDMHJOZ7M5Z7ZKYS3YJN23PAVCNFSM6AAAAAA7BVO4KOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJWG43TIMBRGY . You are receiving this because you were mentioned.Message ID: @.***>

sheridar commented 7 months ago

Great, I'm closing this issue, let us know if you encounter any other issues