Open stoeter opened 1 year ago
Correction: the issue cannot be solved by pre-sorting. It looks like the last row is also parsed as "NA" instead of NA
tail(kIn) plateRow plateColumn Name (Sense) Sense sequence Name (Antisense) Reverse complement Row90#1? 8 7
? 8 8 Row91#1 Row92#1? 8 9 ? 8 10 Row93#1 Row94#1? 8 11 ? 8 12 NA NA NA NA sample number sample number.text sample type plate number plate number.text Row90#1? NA Row95#1 H2O 2 002 Row91#1 ? NAH2O 2 002 Row92#1? NA H2O 2 002 Row93#1 ? NAH2O 2 002 Row94#1? NA H2O 2 002 Row95#1 ? NAH2O 2 002 sample number in plate plateRow384 plateColumn384 plateNumber384 Row90#1? NA 15 14 1 Row91#1? NA 15 16 1 Row92#1? NA 15 18 1 Row93#1? NA 15 20 1 Row94#1? NA 15 22 1 Row95#1? NA 15 24 1 kIn$ Name (Sense)
[180:192] [1] NA NA NA NA NA NA NA NA NA NA NA NA "NA"
pasted code from command line was not properliy displayed in previous comment:
tail(kIn)
plateRow plateColumn Name (Sense) Sense sequence Name (Antisense) Reverse complement
Row90#1_? 8 7 <NA> <NA> <NA> <NA>
Row91#1_? 8 8 <NA> <NA> <NA> <NA>
Row92#1_? 8 9 <NA> <NA> <NA> <NA>
Row93#1_? 8 10 <NA> <NA> <NA> <NA>
Row94#1_? 8 11 <NA> <NA> <NA> <NA>
Row95#1_? 8 12 NA NA NA NA
sample number sample number.text sample type plate number plate number.text
Row90#1_? NA <NA> H2O 2 002
Row91#1_? NA <NA> H2O 2 002
Row92#1_? NA <NA> H2O 2 002
Row93#1_? NA <NA> H2O 2 002
Row94#1_? NA <NA> H2O 2 002
Row95#1_? NA <NA> H2O 2 002
sample number in plate plateRow384 plateColumn384 plateNumber384
Row90#1_? NA 15 14 1
Row91#1_? NA 15 16 1
Row92#1_? NA 15 18 1
Row93#1_? NA 15 20 1
Row94#1_? NA 15 22 1
Row95#1_? NA 15 24 1
> kIn$`Name (Sense)`[180:192]
[1] NA NA NA NA NA NA NA NA NA NA NA NA "NA"
Sorry, the bug is even more severe. It looks like the last row with a value that is followed by a missing value is parsed as a missing value. In this example there are 10 sequences with their names. seq_010 was replaced by a missing value, NA, in kIn.
In addition what is very strange is that the column sampleNumber.text, which is also a text column looks fine!???
At this point is is also not clear to me what NA in R means:
Also saw this today again in newer software versions. To me this is a severe bug, because it changes the data where it should not, and the user might not even notice! (=> changed Priority)
Win11 Pro KNIME 4.7.8 R version 4.3.2 (2023-10-31 ucrt) Rserve 1.8-13
Current example of string column parsed to R from KNIME and back to KNIME (all "NA" were previously ?
Ok, just realized the full problem of the bug again: a) certain missing values are parsed to R as "NA" (minor bug, annoying, but one could correct this) b) certain values (strings, rows, that are followed by a missing value) are parsed to R as NA (missing value in R) and are therefore lost! (severe bug!)
When a table contains text columns starting in the first rows with missing values then in R theres are correcty parsed to NA. However, when a row contains a text value, then the previous row is parsed as "NA". In consequese the table retured to KNIME contains "NA" entries where before R snippet there was a missing value. This is a bug.
Then checked on R side this happens already in the parsed kIn table (from the data provided below): kIn$
Name (Sense)
[1:20] [1] NA NA NA NA NA NA NA NA NA[10] NA NA NA "NA" "seq_001" "seq_002" "seq_003" "seq_004" "seq_005" [19] "seq_006" "seq_007"
Sorting the table in KNIME in such a way that missing values are at the end of the table solves the issue for that column.
Attached KNIME workflow with data showing the problem...
Win7 KNIME 4.5.2 R version 3.6.1 (2019-07-05) Rserve 1.8-6
R snippted problem with NAs.zip