trinker / gofastr

Make a DocumentTermMatrix faster
20 stars 3 forks source link

Not urgent: high mem usage with filter_documents() #8

Open arupakaa opened 7 years ago

arupakaa commented 7 years ago

Hi,

Your work has been a godsend! Just wanted to share a minor issue...

Im working on a relatively large dataset (final DTM is ~25MB).

When I run filter_documents(), mem usage balloons up to 40GB, after which R crashes.

I'm running the Microsoft/MRO (latest) release on Mac OSX (latest).

One work-around, for the min = 1 case at least, is: .[unique(.$i),] .

Runs super-fast and doesn't seem to introduce any undue weirdness.

Hope that helps.

trinker commented 7 years ago

Thanks you for submitting this. I'll look into it.