rethomics / damr

Read TriKinetics' DAM data in R
http://rethomics.github.io
6 stars 6 forks source link

id should be factor, not a character #3

Open qgeissmann opened 6 years ago

qgeissmann commented 6 years ago

same issue as https://github.com/rethomics/scopr/issues/2

pepelisu commented 6 years ago

It's not specified in the tutorial that this should be character, if it is loaded in the csv as integer the load_dam2 complains.

Error in `[.data.table`(q, , .(regions = list(region_id)), by = c("path",  : 
  column or expression 1 of 'by' or 'keyby' is type list. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]
estebanbeck commented 5 years ago

Hi, a collaborator is having the same problem. But I don't get how to go around it. How do I need to save the monitor file? This is probably because the pass-through excel to process (crop) the data, right?

qgeissmann commented 5 years ago

hi, can she/he send the code they are using/a sample file? cheers

estebanbeck commented 5 years ago

With the sample files, it works perfectly. It doesn't work with their files, but they look really similar!

NicoleStephens96 commented 5 years ago

Could somebody please explain why this error occurs, and how to correct the problem. The first few times using Rethomics, I did not receive this error, but now I am even with the files that originally did not get the error

qgeissmann commented 5 years ago

hi @NicoleStephens96, I probably have time to look into it this week. Can you send as much detail as possible regarding your code and the errors you get (i.e. what line of code raises the error). Also, If you can send me your metadata file it could help me a lot :), thanks :)

NicoleStephens96 commented 5 years ago

Hi Quinten,

I appreciate this. I’m not sure why I am receiving the following error when trying to load. This is the same error I received when trying to practice with the practice data.

Error in [.data.table(q, , .(regions = list(region_id)), by = c("path", : column or expression 1 of 'by' or 'keyby' is type list. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]

The following is a folder with the metadata file and monitor text files. ‘ Thanks again, Nicole

On Nov 26, 2018, at 11:14 AM, Quentin Geissmann notifications@github.com wrote:

hi, I @NicoleStephens96 https://github.com/NicoleStephens96 probably have time to look into it this week. Can you send as much detail as possible regarding your code and the errors you get (i.e. what line of code raises the error). Also, If you can send me your metadata file it could help me a lot :), thanks :)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rethomics/damr/issues/3#issuecomment-441720647, or mute the thread https://github.com/notifications/unsubscribe-auth/ArQxkrPg6ACjRbvNoTg4e00so4do32Fpks5uzCF1gaJpZM4O-Y6G.

qgeissmann commented 5 years ago

hey after which line of code do you get the error, also which practice file/tutorial are you using :)? thanks

NicoleStephens96 commented 5 years ago

I am getting this error after the Dt<-load_dam(metadata) code. Also I used the data under DAM2 in practice “getting the data”

Thanks! Nicole

Sent from my iPhone

On Nov 26, 2018, at 2:08 PM, Quentin Geissmann notifications@github.com wrote:

hey after which line of code do you get the error, also which practice file/tutorial are you using :)? thanks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

pepelisu commented 5 years ago

I think the error is coming from excel to csv conversion. When metadata is created in excel and the column region_id is interpreted as a integer (number) then in the csv is saved as a number. However rethomics function load_dam is expecting a character in that column. To solve the problem:

  1. open the csv in excel
  2. change the column format from number or general to text.
  3. save the csv again as with csv format, selecting option of text separated by commas.

To permanently solve this issue I would recommend to check the type of the column and if it is an integer transform it to character (if needed).

qgeissmann commented 5 years ago

Thanks @pepelisu! To be honest I am struggling a bit with this thread. I cannot reproduce the bug (yet0. I don't think the region_id type is the issue. region_id is expected to be an integer all along (e.g. it is an integer in all the tests). In fact, linking will not work if region_id is a character. Therefore I would be surprised if this worked... For me it looks more like a data.table issue... @NicoleStephens96 does @pepelisu's trick solve anything for you?

qgeissmann commented 5 years ago

My understanding is that the path to the file that is generated during the linking is a list in your platform and a character in mine. @NicoleStephens96, it would help a lot if you could run this for me:

# the normal linking of the tutorial metadata
metadata <- link_dam_metadata(metadata, result_dir = DATA_DIR)
metadata[, sapply(file_info, function(x) x$path)]

and

str(metadata)

and paste the results for both :)

NicoleStephens96 commented 5 years ago

I figured out the problem last night on my end and maybe this can help anyone else getting the same error:)

I was looking for differences between the data I got working and data I could not get working. What I found was, for some reason, extra rows were added to the metadata that contained blanks and N/A.


94: Monitor6.txt 2018-11-10 00:00:00 2018-11-16 00:00:00 6 30 M Sal 51 95: Monitor6.txt 2018-11-10 00:00:00 2018-11-16 00:00:00 6 31 M Sal 53 96: Monitor6.txt 2018-11-10 00:00:00 2018-11-16 00:00:00 6 32 M Sal 58 97: NA NA
98: NA NA
99: NA NA
file start_datetime stop_datetime machine_id region_id Sex Genotype

I simply deleted the extra rows using the code below and was able to link and load the data without a problem.

metadata<- metadata[-c(97,98,99)] metadata

I appreciate everybody's help!

qgeissmann commented 5 years ago

thanks, that really helps! (also you can use metadata <- na.omit(metadata)). So were they just empty rows, or just missing values?

NicoleStephens96 commented 5 years ago

They were just empty added rows

On Nov 30, 2018, at 12:47 PM, Quentin Geissmann notifications@github.com wrote:

thanks, that really helps! (also you can use metadata <- na.omit(metadata)). So were they just empty rows, or just missing values?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rethomics/damr/issues/3#issuecomment-443300905, or mute the thread https://github.com/notifications/unsubscribe-auth/ArQxkowyDGDnrqbQQUmrU7x8Oaebz-bpks5u0X1SgaJpZM4O-Y6G.

qgeissmann commented 5 years ago

so excel seems to keep gosh rows that are completely empty. I will add a check to remove all empty rows from metadata

qgeissmann commented 5 years ago

@NicoleStephens96 can you upload an excel-generated csv with empty rows in the end for me -- so I can see how they are exactly? Thanks

jtengjia commented 2 years ago

In my case, the problem is that the metadata has two exactly same lines. In normal condition the result should be like this using the tutorial data: image but when my metadata contains repeated lines it shows like this: image so my solution code is like this: metadata_final <- fread(paste0(DATA_DIR_final,"/metadata.csv")) %>% unique() use 'unique()' function to remove duplicated line. and then the error disappears. Hope this could help someone getting the same error.

My understanding is that the path to the file that is generated during the linking is a list in your platform and a character in mine. @NicoleStephens96, it would help a lot if you could run this for me:

# the normal linking of the tutorial metadata
metadata <- link_dam_metadata(metadata, result_dir = DATA_DIR)
metadata[, sapply(file_info, function(x) x$path)]

and

str(metadata)

and paste the results for both :)

ronjafrigard commented 5 months ago

The %>% unique () command worked for me! Thank you!