psychoinformatics-de / remodnav

Robust Eye Movement Detection for Natural Viewing
Other
59 stars 16 forks source link

ValueError: cannot convert float NaN to integer #38

Closed ribblockm closed 2 years ago

ribblockm commented 2 years ago

Hi!! I'm running remodnav "data/eyes.tsv" "data/events.tsv" 0.0185581232561 1000.0. No matter what I do to get rid of NaN values in my dataset, I get the following error:

Traceback (most recent call last): File "/Users/richardbarana/miniconda3/envs/lab/bin/remodnav", line 8, in <module> sys.exit(main()) File "/Users/richardbarana/miniconda3/envs/lab/lib/python3.8/site-packages/remodnav/__init__.py", line 145, in main pp = clf.preproc( File "/Users/richardbarana/miniconda3/envs/lab/lib/python3.8/site-packages/remodnav/clf.py", line 855, in preproc data['x'][mask] = np.nan ValueError: cannot convert float NaN to integer

The values in my dataset are np.float32 type. I tried, before running remodnav, to fix it by using: df[0] = np.floor(pd.to_numeric(df[0], errors='coerce')).astype('Int64') for both columns. Got the same error.

Thanks!!

adswa commented 2 years ago

Hey, which version of of remodnav do you have installed? (running e.g., python -c 'import remodnav; print(remodnav.__version__)' should output this). Would it be possible for you to share the data this error occurs with?

ribblockm commented 2 years ago

It is the 1.1.1 version.

You can find the files, the numpy array and the tsv file I made, here: https://github.com/rickybblock/usp_lab

Thanks!

ribblockm commented 2 years ago

When converting floating values to integer (in R; tried something different), the NA's returned (I think because some values are too big like 3.e+12) I replaced by 3000.

adswa commented 2 years ago

Thanks. I don't think any nan's in your dataset are at fault. On first sight it actually seems like the immediate cause for that error is a lack of nan's in the dataset, and the error occurs when we try to dilate samples around signal loss but do not find any signal loss. I wonder if this is only a symptom of a different problem, though. I tried to explore your data a bit, but didn't get too far from just looking at the values - could you just tell me a bit about the data? E.g., I'm a bit surprised about the X and Y value ranges. It seems the X-range is [-742, 1687567872] and the Y-range is [-825, 2128080057], and the mode of both X and Y values is exactly 3000 (which looks like it is the more probable upper bound and those really high max values are spurious?). Can you shed some light on what those coordinates are, what the screen size was, if the data has been normalized, what kind of viewing paradigm and eyetracker it was, or where on the screen coordinate (0, 0) is?

ribblockm commented 2 years ago

Hi!! I'm really glad you're helping me. They are eye movements captured while free-viewing videos, by VPixx EyeLink 1000. The screensize is (1680., 1050.) Those really big values are the blinks - when converting them to integer values they were producing the NaN's that I thought might be the problem. So I changed them to the value 3000. I think that is why it is the mode. The coordinates are in the VPixx standard, where the (0, 0) is at the center of the screen.

ribblockm commented 2 years ago

I just tried to insert some NaN values where the blinks are (instead of 3000), and it came back with the same error.

adswa commented 2 years ago

Could you maybe share a file where you didn't replace nans?

The screensize is (1680., 1050.)

Thanks for the info! So are the units of your X and Y recordings screen pixels? And what are the negative values?

ribblockm commented 2 years ago

The file in numpy (.npy) is not adjusted, they are my raw data. I'll make one tsv from that and put there for you.

Yes, the X and Y are screen pixels. Some of them are negative because the (0, 0) is at the center. I have them converted too, to (0, 0) being at the top left, will update at the repository as "eyes_cv.npy".

Thanks,

ribblockm commented 2 years ago

In https://github.com/rickybblock/usp_lab you can find now: eyes_cv.npy and eyes_raw_topleft.tsv: (0, 0) at the top left, raw data (without nan's). eyes_raw_center.tsv: the same as eyes.npy, raw data and (0, 0) at the center of the screen.

adswa commented 2 years ago

Cool, thanks for those files. If I take a file and convert all values that are larger than your screensize into NaN things seem to work. I downloaded eyes_raw_topleft.tsv and did the following:

from remodnav.clf import EyegazeClassifier
from remodnav.tests.utils import show_gaze
import numpy as np
import matplotlib.pyplot as plt
# note: the file is slightly different than remodnav would expect it, 
# so if you want to give it to the program from the command line, 
# you need to remove the first column (with are currently samples),
# and the first row (which says 0,1). 
# Here I'm doing this by hand with usecols and skip_header
data = np.recfromcsv('eyes_raw_topleft.tsv', delimiter='\t', names=['x', 'y'], usecols=[1, 2], skip_header=1)

# convert all values that are unreasonably large to nan
data[data['x'] > 1680] = np.nan
data[data['y'] > 1050] = np.nan

# initialize the classifier
clf = EyegazeClassifier(px2deg=0.0185581232561, sampling_rate=1000)
p = clf.preproc(data)
detectedEvents = clf(p)

# plot the result
show_gaze(pp=p, events=detectedEvents, sampling_rate=1000, coord_lim=(0, 1680), vel_lim=(0.001, 1000))
plt.show()

The classification succeeds, and the figure (this is zoomed into the first few seconds) doesn't look too bad: Figure_1

Could you check if this works for you as well? You should be able to copy-paste this script into a python session if you open this python session in the same directory that eyes_raw_topleft.tsv is in

ribblockm commented 2 years ago

Thank you so much!!

I did get a figure; it doesn't look like yours, but got one. Can you explain to me how to restrict the time to plot? The detectedEvents is a list with a length smaller than the data, right?

I'll try again with the appropriate px2deg (0.035901486) and maybe it'll be better. But that already looks nice!

Thanks,

adswa commented 2 years ago

I did get a figure; it doesn't look like yours, but got one.

Probably because I zoomed in, the figure is large, and the figure size so small it all looks very dense. :)

The detectedEvents is a list with a length smaller than the data, right?

Yes, it has one entry for each classified event. Each entry in detectedEvents in an interactive python session would be one line in the output tsv file you would get when you use remodnav from the command line.

Can you explain to me how to restrict the time to plot?

You can index p with the samples you want to plot, e.g., for the first 10 seconds/10000 samples:

show_gaze(pp=p[:10000], events=detectedEvents, sampling_rate=1000, coord_lim=(0, 1680),vel_lim=(0.001, 1000))

As it seems that the problem was resolved, I'll close the issue :)

ribblockm commented 2 years ago

Thanks again! It really meant a lot!