iDigBio / ridigbio

ridigbio -- an R interface to iDigBio's API (see http://www.idigbio.org/)
http://idigbio.github.io/ridigbio/
Other
16 stars 10 forks source link

Speed up #13

Closed fmichonneau closed 9 years ago

fmichonneau commented 9 years ago

The interesting part is on line 155-168

I removed the nested loops and use the lapply function to vectorize the conversion from list to data.frame.

In the process, I removed some things that I think were left overs. Especially I removed the part of the code that dealt with the data slot as nothing was done with it.

It passes the tests and is about 10 times faster for a query on 1000 records....

I'll work on removing the other loops tomorrow.

mjcollin commented 9 years ago

Very cool. For me it looks like user CPU time goes from 9.5 sec to 5.5 sec for a 42% speed-up. Transfer time is still a little quicker than CPU time at 3-4 sec on my cable modem + wireless. I'll see what happens tomorrow on gigabit.

The only other loop (I think) is in base.R:147 which formats the fields list. I set it up that way because I wanted the order of the output fields to match the order of the user's fields vector. I wasn't sure if lapply ensured that. It's run once and iterates at most 250 times but feel free to replace it if the ordering can be preserved.

Deferring merge to see what awesomeness you add next.