immunomind / immunarch

šŸ§¬ Immunarch: an R Package for Fast and Painless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires
https://immunarch.com
Apache License 2.0
297 stars 65 forks source link

Problem with importing columns that are mostly NAs #379

Open christianwoe opened 10 months ago

christianwoe commented 10 months ago

Hi all,

šŸ› Bug

I was trying to import some data from MiXCR 4.3.1 tsv files and recognized warning messages for some of the samples. After further checking it seems that in rare cases columns are assigned to type logical even if there are cases where character content is present for some of the clones. However, those cases are replaced by 'NA' and therefore the information is discarded. It looks for me like the readr function in inside repLoad is guessing the wrong type of the column, probably because it only checks a subset of rows.

It would be helpful to be able to modify the parameter provided to the readr function, either 'col_types' or 'guess_max'. Or is there already another solution?

To Reproduce

Steps to reproduce the behavior:

  1. repLoad(pathname)

This is the warning message.

Warning: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
  dat <- vroom(...)
  problems(dat)

Expected behavior

Columns with at least 1 non-NA are not assigned to type logical.

Many thanks and kind regards, Christian

vadimnazarov commented 9 months ago

Hi @christianwoe

Thank you for opening the issue. Could you share an example of such data please? What columns are usually the problematic ones?

I'm open to scheduling a short call to discuss this issue over Zoom if this accelerates things.

christianwoe commented 9 months ago

Hi, here is an example based on test data where I think the 'allDHitsWithScore' is causing a warning, because only one of all the clonotypes has an assigned value here.

Best wishes, Christian

Multi_TRA_FS115_2_S150.clones_TRAD.tsv.zip

yls2g13 commented 5 months ago

Hey everyone, I'm facing a similar issue here - was this fixed in the latest update? Cheers, Nicole