rethomics / damr

Read TriKinetics' DAM data in R
http://rethomics.github.io
6 stars 7 forks source link

Loading failure #17

Closed BenjaminHouot closed 5 years ago

BenjaminHouot commented 5 years ago

Hi, Actually, I’m trying to run the package of rethomics under R without any success. I’m stuck in the loading step. For that I have used the data sets from both Geissmann et al. 2018 paper and DAM2 data tutorial. Also, I have generated my own metadata file and I got the same output. In the previous step everything look like normal but at the loading nothing is working. Here the following error message, when I applied the following sentence

dt <- load_dam(metadata)
Error in find_dam_first_last_lines(path, start_datetime, stop_datetime,  : 
  No data in selected date range

Any ideas about where this issue is coming from please?

Here my sample

Monitor1.txt

metadata_origin02.xlsx

qgeissmann commented 5 years ago

Hi, thanks for reporting your troubles. Could you paste your whole code, including the "linking" step? This would help me to try to reproduce the issue :),

BenjaminHouot commented 5 years ago

Here my script in R studio

library(devtools) library(behavr) library(damr) library(ggetho)

DATA_DIR <- "C:/Users/Serafino/Desktop/Trial_V01" setwd(DATA_DIR) metadata <- fread("metadata.csv") metadata metadata <- link_dam_metadata(metadata, result_dir = DATA_DIR) metadata dt <- load_dam(metadata)

In my previous message I sent you my Xlsx file in order to show you how I made my metadata file without the conversion in CSV. Just a example but I have the same issue with file from tutorial.

Thank for your help

jaspwn commented 5 years ago

Hi Benjamin,

If you are saving your .xlsx file as .csv using excel you will not get the correct metadata.csv file as the excel file you linked has all required metadata columns in the first column which results in a single column .csv file rather then desired six column.

Your code works for me after I copied your .xlsx file to a text editor and saved it as a .csv file. See the difference between the files below.

excel_exported_metadata.txt text_editor_copied_metadata.txt

Wenfone commented 5 years ago

Hi,

I also face this problem when I loaded the practice data from 'damr_tutorial' file.

the error is: Error in find_dam_first_last_lines(path, start_datetime, stop_datetime, : No data in selected date range

Here are my R script

library(behavr) library(damr) library(ggetho) DATA_DIR <- "C:/Users/cwf/Documents/Rethomics Tutorial/DAM2 data in practice" list.files(DATA_DIR, pattern= ".txt|.csv") setwd(DATA_DIR) DATA_DIR <- "C:/Users/cwf/Documents/Rethomics Tutorial/DAM2 data in practice" list.files(DATA_DIR, pattern= ".txt|.csv") setwd(DATA_DIR) metadata <- link_dam_metadata(metadata, result_dir = DATA_DIR) metadata dt <- load_dam(metadata) summary(dt)

everhthing is fine except when i runned the the load step. Attachend are the files i used. Monitor11.txt Monitor14.txt Monitor64.txt

Are there any problem for the time format in the metadata.csv file and the Monitor files?

Thanks!

BenjaminHouot commented 5 years ago

Hi, I have tried your txt file (after saving them into a csv.file) to run my script but I’m still stuck with loading step. When I open your txt file the text_editor_copied_metadata file contains all the information in one line and for excel_exported_metadata like this: "file,start_datetime,stop_datetime,region_id,treatment,baseline_days" "Monitor1.txt,2018-12-04 08:00:00,2018-12-09,1,control,1" For the last one, I got a new error message: Error in [.data.table(q, , .(regions = list(region_id)), by = c("path", : column or expression 1 of 'by' or 'keyby' is type list. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]

Then, I removed the " character and now I got the error message like before. I also tried to change the , by ; because my version of Excel performs the conversion to csv by using ; separator. In this case, I got a csv file where each variable is contained in one column but the date is different. In this options my script is not running. I also tried to start with an excel (Matrix_metadata) file where each variable is contained in one column. I save it in txt file (data.txt) and performed a csv conversion and here again, my script is not running. It’s very strange and I tried different way to play around but the solution didn’t come up. I have to say that even with practice data from tutorial I’m not able to run them without opening them in order to prevent any modification from my excel version. If you have any more idea that I can test please let me know or can you send me a new set of example, please?

data.txt Matrix_metadata.xlsx Monitor1.txt

BenjaminHouot commented 5 years ago

Hi,

I found out a way to run my script by just using my personal computer which is a Macbook Pro. The version I have is macOS Mojave (14.42.2). I saw a difference between my Mac and the PC in my lab is when I dowload the files from damr tutorial. On my Mac, the structure of metadata file is correct, I mean one variable for one column or with my PC there is only one column. Even when I create my own metadata file with excel in the good shape, I still have the same issue. You told me that you are working with a unix system without any problem, have you tried to test it on PC to see if you got the same issue because for me the system seems to generate some specific issue in handling. It's very strange and I hope you will have some good tip to fix it.

See you soon

jaspwn commented 5 years ago

Hi Benjamin,

You are certainly experiencing some file encoding problems when using the windows version of excel? As previously mentioned you do not want each row in "quotation" marks and when you removed them your next error was probably caused by different end of line characters. These are invisibile characters that denote end of lines and windows uses both carriage returns (CR) and line feed (LF) whereas UNIX systems just uses (LF).

One point of clarification is the text_editor_copied_metadata.txt file I created does not need to be re-opened in excel to be saved as a .csv file, it should immediately work with the fread() in R. Another tip I can offer you is that you can change the default list separator so you excel exports csvs separated by , rather than ; in your windows settings (see this link https://optimalbi.com/blog/2015/07/16/how-to-export-an-excel-file-to-pipe-delimited-file-rather-than-comma-delimited-file/ )

But none of this solves your problem, I think the best solution is using R to create your metadata data structure in the first place, for me it is quicker and easier to do this rather than creating it in excel.

Here is example code of how I would create the metadata file you linked in data.txt

metadata <- data.table(file = rep("Monitor1.txt", times = 32), start_datetime = rep("2018-03-12 08:00:00", times = 32), stop_datetime = rep("2018-10-12", times = 32), region_id = rep(1:32), treatment = rep("control", times = 32), baseline_days = rep(1, times = 32))

Good luck!

Georges-Farkouh commented 5 years ago

Hi, I am having the same issue in the loading step :

> dt <- load_dam(metadata)
Error in find_dam_first_last_lines(path, start_datetime, stop_datetime,  : 
  No data in selected date range

I have tried to use MacOS High Sierra and Windows 10 and I am still getting the same problem, the DAM system data provided with the tutorial is working but the problem when I try to use my own data. Here is an example of the R code and data text file that I am trying to analyse.

DATA_DIR <- "/Volumes/Desktop/MonitorRetho"

list.files(DATA_DIR, pattern= "*.txt|*.csv")
setwd(DATA_DIR)

metadata <- data.table(file = rep("MonitorCtM020.txt", times = 32), 
                       start_datetime = rep("2019-06-20 09:00:00", times = 32), 
                       stop_datetime = rep("2019-06-22", times = 32), 
                       region_id = rep(1:32))

metadata <- link_dam_metadata(metadata, result_dir = DATA_DIR)
metadata

dt <- load_dam(metadata)
summary(dt)

MonitorCtM020.txt

jaspwn commented 5 years ago

Hi Georges,

Your code and file work fine on my system - Ubuntu 18.04.

If you view the example file that works and your one with hidden characters visible, do you see any differences?

Jason

Georges-Farkouh commented 5 years ago
Capture d’écran 2019-06-27 à 13 56 20 Capture d’écran 2019-06-27 à 13 56 52

I have checked the hidden characters there are no differences, I made sure that text file is encoded as UTF-8 and Unix LF as ligne breaks (as the the tutorial file) but the error is still exist.

Georges

Georges-Farkouh commented 5 years ago

Hello,

For update I was able to fix the problem by changing the system language to English US (previously was set to French) and this will fix the problem (tested with MacOS and Linux Ubuntu 18.04)

qgeissmann commented 5 years ago

Thanks so much for the update @Georges-Farkouh ! that gives us good insight on how to reproduce and fix the issue!

qgeissmann commented 5 years ago

Thanks @BenjaminHouot, @Georges-Farkouh, @Wenfone and @jaspwn for helping and reporting! In the end, is non-English installations of R failed to read the date (through readr). I will send the patched package to CRAN today!

Georges-Farkouh commented 5 years ago

Thank you @qgeissmann for the quick identification of the origin of the issue and patching.

mantouyangmeng commented 5 months ago

Hi, I'm trying to run the damr package, and when I run load_dam(metadata), I now keep getting "Error in find_dam_first_last_lines(path, start_datetime, stop_datetime,. No data in selected date range". I've searched for ways to fix this (such as following the help provided in the error), but it still continues to report the error, and was wondering if you've encountered this problem?