jamesmullenbach / caml-mimic

multilabel classification of EHR notes
MIT License
278 stars 125 forks source link

Handling very large text field #5

Closed datduong closed 6 years ago

datduong commented 6 years ago

Hi,

I am having problem with this line 40 in the training.py

csv.field_size_limit(sys.maxsize)

The error says OverflowError: Python int too large to convert to C long.

What do you thinking is causing this problem? Is it the total number of words (not unique words, but total count of words)?

Can I ask what was your memory usage?

Thanks.

jamesmullenbach commented 6 years ago

Hm, I haven't seen this error before. That line shouldn't have anything to do with the vocab size, I think. This happens right on line 40, or it comes up during training?

During training of CAML on mimic3 full label, I use < 2GB on the CPU and up to about 9GB on the GPU.

datduong commented 6 years ago

Hi James, this must be something wrong with my Python. Even doing a simple call, without any other code gives the error.

>>> csv.field_size_limit(sys.maxsize)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
>>> sys.maxsize
9223372036854775807

What is your sys.maxsize? I can just override this input value.

datduong commented 6 years ago

I was able to solve this problem by just using a smaller number as the input to csv.field_size_limit.