immunomind / immunarch

🧬 Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
310 stars 66 forks source link

Missing mandatory FR1.nt column in sample #321

Open Hao961004 opened 1 year ago

Hao961004 commented 1 year ago

❓ Questions and Help

Dear immunarch team

Thank you for this friendly tool.

Now I want to analysis the SHM rate of my sequence. I have loaded the files analyzed by IMGT-High-V-QUEST.

but when I use #generate germline, it show me: Error in validate_mandatory_columns(., sample_name) : Missing mandatory FR1.nt column in sample covid_H_airr!

I checked the data file and there are only 15 columns, so this might be Immunarch didn't read enough columns. How to solve this problem?

Best Regards,

Hao

Alexander230 commented 1 year ago

Hi, Hao!

I'm Aleksandr Popov, a developer of Immunarch package. Thank you for using our software!

Region columns (FR1 - FR4, CDR1 - CDR3) columns are mandatory for repGermline, it cannot be calculated without them. If the columns are present in the input file, but not loaded to Immunarch data with repLoad, probably, they have names in the input file that are not recognized by a parser. Can you, please, send an example of your input data? I can check the column names and add them to the parser, so it will recognize them.

Another possibility is that some region columns are missing in the input data. Probably, there is an option in your preprocessing tool that you can turn on to calculate these columns.

Best regards, Aleksandr

Hao961004 commented 1 year ago

Hi Aleksandr!

Thanks for your quick reply.

I think there should be similar names in the input file but are not recognized by parser. I tried to directly change the columns' name to match with Immarch and upload again but failed.

I email my example file to you, pls check it. Thank you so much !

Regards,

Hao

Hao961004 commented 1 year ago

Hi, Aleksandr!

Here is my exmaple data which analyzed by IMGT-High-V-QUEST.

I just uploaded this vquest_airrr.tsv file, pls check it.

Thank you so much!

Regards,

Hao


发件人: Aleksandr Popov @.> 发送时间: 2022年11月25日 21:08 收件人: immunomind/immunarch @.> 抄送: Zhou Hao @.>; State change @.> 主题: Re: [immunomind/immunarch] Missing mandatory FR1.nt column in sample (Issue #321)

Hi, Hao!

I'm Aleksandr Popov, a developer of Immunarch package. Thank you for using our software!

Region columns (FR1 - FR4, CDR1 - CDR3) columns are mandatory for repGermline, it cannot be calculated without them. If the columns are present in the input file, but not loaded to Immunarch data with repLoad, probably, they have names in the input file that are not recognized by a parser. Can you, please, send an example of your input data? I can check the column names and add them to the parser, so it will recognize them.

Another possibility is that some region columns are missing in the input data. Probably, there is an option in your preprocessing tool that you can turn on to calculate these columns.

Best regards, Aleksandr

― Reply to this email directly, view it on GitHubhttps://github.com/immunomind/immunarch/issues/321#issuecomment-1327397164, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AZZHNBVCM6YNOJOSGPR52VLWKCT5JANCNFSM6AAAAAASKGXX3E. You are receiving this because you modified the open/close state.Message ID: @.***>

NaniNaniOkotta commented 1 year ago

I have the same issue. Is it possible that FR1-IMGT (and probably the rest of the columns?) need to be renamed?

Hao961004 commented 1 year ago

I have the same issue. Is it possible that FR1-IMGT (and probably the rest of the columns?) need to be renamed?

I tried to rename the columns, but it also can not read the file.

Alexander230 commented 1 year ago

Hi, @Hao961004 and @NaniNaniOkotta! It looks like you are loading input formats for which loading of region columns is not currently implemented in Immunarch parser.

@Hao961004, where did you upload the example file? I didn't find it. @NaniNaniOkotta, could you, please, upload your example input file too?

I will add support for FR and CDR columns for these formats, to make it possible to use them with BCR pipeline.

Best regards, Aleksandr

Hao961004 commented 1 year ago

Hi, @Hao961004 and @NaniNaniOkotta! It looks like you are loading input formats for which loading of region columns is not currently implemented in Immunarch parser.

@Hao961004, where did you upload the example file? I didn't find it. @NaniNaniOkotta, could you, please, upload your example input file too?

I will add support for FR and CDR columns for these formats, to make it possible to use them with BCR pipeline.

Best regards, Aleksandr

example.zip

Hi, so sorry for my mistake.

I replied to your message by the email attached with the example file, I thought you could receive it.

Here is the example file, pls check it.

Thank you so much.

Regards, Hao

NaniNaniOkotta commented 1 year ago

I tried it 2 different ways. One is the same format file Hao shared (I got it from IMGT) and I also tried with filtered_contig_annotations.csv from single-cell data.

filtered_contig_annotations.csv

Alexander230 commented 1 year ago

Thank you! I will update the parsers for these formats and tell you when it will be the updated version.

Best regards, Aleksandr

Alexander230 commented 1 year ago

Hi, @Hao961004 and @NaniNaniOkotta! I've added support for these columns to parsers for AIRR and 10x contigs formats. Now it's in branch new-bcr-input-formats. Eventually, it will be merged into dev, and then included into the next release of Immunarch.

You can install Immunarch from this branch now with these commands:

install.packages(c("devtools", "pkgload"))
devtools::install_github("immunomind/immunarch", ref="new-bcr-input-formats")
devtools::reload(pkgload::inst("immunarch"))

If you have more question, feel free to ask them!

Best regards, Aleksandr

NaniNaniOkotta commented 1 year ago

Still not working. If I try the filtered contig .csv I get this error:

-- [1/1] Parsing "WT/filtered_contig_annotations.csv" -- 10x (filt.contigs) 0sError in df[, vec_names]:
0s! Can't subset columns that don't exist. ✖ Column fwr1_nt doesn't exist.

And for IMGT I get the same error as before Error in validate_mandatory_columns(., sample_name) : Missing mandatory FR1.nt column in sample 3_Nt-sequences!

Alexander230 commented 1 year ago

It looks like remains of old version are still in R environment. Try the following steps to troubleshoot the problem:

  1. Update all installed R packages, like described here: https://www.r-bloggers.com/2014/11/update-all-user-installed-r-packages-again/
  2. Run R --vanilla
  3. Run the following commands:
install.packages(c("devtools", "pkgload"))
devtools::install_github("immunomind/immunarch", ref="new-bcr-input-formats")
library("immunarch")
immdata <- repLoad("/path/to/filtered_contig_annotations.csv")

On my PC it loaded the file successfully, with all CDRx and FRx columns.

Best regards, Aleksandr

elliew87 commented 1 year ago

Hi, I have a similar issue using airr files, but my error is Please use select() instead.Error in select(): ! Can't subset columns that don't exist. ✖ Column cdr1 doesn't exist. Backtrace:

  1. immunarch::repLoad("~//.R/AIRR/")
  2. dplyr:::select_.data.frame(...)
  3. dplyr:::select.data.frame(.data, !!!dots)

any idea how I can fix this? Many thanks,

Ellie

danshu01 commented 1 year ago

Hi, I'm having the same issue with IGH data sequenced by Adaptive. What is the best way to resolve it? Thanks, Dan