Closed alxndrkalinin closed 4 years ago
That's interesting. For some reason the sirna
column in the Kaggle dataset has changed. Previously sirna
was just an integer, where untreated class was 1138. Right now, sirna_
prefix was added and also ids of sirnas have changed, so 1108 isn't a split point between normal and control sirnas anymore. Try downloading dataset from the official webpage (https://www.rxrx.ai/rxrx1) and it should work.
The dataset from the official webpage has also a little different format, but the numeration of sirnas is correct. You can find a mapping between these two, or assign all treatment sirna (i.e. from train.csv
) indices <1108, and for control sirnas indices >=1108
Thanks a lot! It seems like I was able to map IDs and the training works now.
Hi, thanks for sharing your code!
I'm trying to run training with the first command (with or without
--all-controls-train 0
):and it seems like there is an issue with parsing of sirna numbers when reading them from .csv file:
I noticed you're using 1138 later as an extra id, so I tried to assume it's for
UNTREATED
controls and to parse sirnas like this:but I still ran into another assertion:
Can you please help with understanding how to parse sirna ids?