lageIBUSP / Rsampling

R package to reproduce the Resampling Stats Add-in
2 stars 9 forks source link

Most randomizations are still slow #1

Closed piklprado closed 9 years ago

piklprado commented 9 years ago

The package functions are not optmized for speed, and should keep the original logic of Resampling stats. Still, check if speed can be improved.

adalardo commented 9 years ago

I tried loop instead rlply and speed indead is improved. I will work on a fork project without plyr

piklprado commented 9 years ago

Great! Maybe boot and functions matriz shuffling in vegan can have better performamnce. Functions in package datatable may also help.

andrechalom commented 9 years ago

I'd be very happy if the package did not depend on plyr, for several reasons: it simplifies the package installation, and makes the code structure more readable to people with experience in R but not in plyr.

While not related to the issue at hand, it would be great if the new functions accepted a "shiny" type of progress bar, as it would allow the shiny app to display the function progress.

adalardo commented 9 years ago

I agree with Chalom. For me for example, It's odd see so many pipe (>) in a R code, seems another language. I think we should use other packages only if is extremely necessary or to improve processing speed. As Chalom said, "makes the code structure more readable to people with experience in R" but not in a specific package - so for a better collaborative project we must keep language as basic as possible.

piklprado commented 9 years ago

I totally agree too. I tried dplyr by two main reasons:

1- In general it speeds processing; 2 - group_by provides a simple way to allow restricted randomizations with strata (wich is not the same as statistic calculated among levels of a categorical variable, see here)

(1) was not true in our case. For (2) I could not find an easy solution for restricted randomizations from the scratch. My best guess by now is to use the boot package, which has stratum and many other nice functionalities. boot returns an object of another class, that we would then convert to dataframes.

But yes, let's work on a unplyr branch!

As changes can affect man pages, vignettes, GUIs, I suggest to be conservative and keep the original structure at least by now, and be extra careful with these dependecies while unplyering the code. Specially I would strive to:

piklprado commented 9 years ago

Also, it is important to do the transition in a separate branch because the master is currently functional and will be used in our field course, next weeks.

adalardo commented 9 years ago

OK! Sounds great! I am not familiar with working into different branch in github, but sounds like a divergence version to join later. As we intend to work on both, how about have a Rsampling_0.R and baseFunctions_0.R as well as have suffix "_0" for all unplyr functions?

andrechalom commented 9 years ago

Ale, I don't think that is necessary, because of how the branching works. You can see an example of branching in the Rsampling-shiny right now. I have two branches, master and dev, that have almost the same files. The branch "master" is the "stable version", and on the branch dev I'm currently trying to reduce some code redundancy, but it also includes a new file "functions.R".

These branches can be selected on the github web interface by the drop down menu "branch:", next to the repository name, or by command line

git checkout master
git checkout dev

This way, I can test the development functions without disruption for the users that are using the master branch.

piklprado commented 9 years ago

Sure, that is the idea. And we can work safe if all of us submit changes through pull requests and not by pushing our changes directly to github repository at lageIBUSP. So far I pushed my changes directly, but from now on I'll create a branch in my repo (piklp) and then submit pull-requests from it. Ideally PR's might be checked by at least one more person to be merged.

2015-06-29 16:08 GMT-03:00 andrechalom notifications@github.com:

Ale, I don't think that is necessary, because of how the branching works. You can see an example of branching in the Rsampling-shiny right now. I have two branches, master and dev, that have almost the same files. The branch "master" is the "stable version", and on the branch dev I'm currently trying to reduce some code redundancy, but it also includes a new file "functions.R".

These branches can be selected on the github web interface by the drop down menu "branch:", next to the repository name, or by command line

git checkout master git checkout dev

This way, I can test the development functions without disruption for the users that are using the master branch.

— Reply to this email directly or view it on GitHub https://github.com/lageIBUSP/Rsampling/issues/1#issuecomment-116799573.

Paulo Inácio de Knegt López de Prado Depto. de Ecologia - Instituto de Biociências - USPhttp://ecologia.ib.usp.br/let http://ecologia.ib.usp.br/get Rua do Matão, travessa 14, nº 321 Cid. Universitária, São Paulo - SP - Brasil CEP 05508-900 +55-11-30910963 (sala) +55-11-30917600 (Secretaria)

adalardo commented 9 years ago

sorry my ignorance. I have working in GitHub a little but still trying to understand the logic. I hope now in these project I can practice more. So I will create a branch and work on it.

andrechalom commented 9 years ago

noplyr version made a huge improvement in performance, so I believe this is not an issue anymore

piklprado commented 8 years ago

---------- Mensagem encaminhada ---------- De: "Comunicação IB" comunica@ib.usp.br Data: 11/01/2016 14:28 Assunto: Divulgação dos editais de processo seletivo para contratação de Professores Contratados III (Professor Doutor) Para: Cc:

Boa tarde,

Divulgamos conforme solicitação do Prof. Marcos Silveira Buckeridge:


Foram publicados no Diário Oficial de hoje , 06/01/2016 , os editais dos processos seletivos para a contratação de três Professores Contratados III (Professor Doutor) por prazo determinado, em jornada de 12 horas semanais, para o Departamento de Biologia, FFCLRPUSP .

1) Área de Restauração Ecológica e Conservação in situ e ex situ de plantas. 2) Área de Análise Ambiental e Planejamento e Gestão de Áreas Protegidas. 3) Área de Evolução biológica (em diferentes níveis de organização), história da teoria evolutiva e biogeografia e evolução e biologia de aves e mamíferos.

As inscrições estarão abertas de 06 a 15/01/2016 , na Assistência Técnica Acadêmica desta Faculdade, das 8h30 às 11h30 e das 13h30 às 16h30, de segunda -feira à sexta -feira, nos dias úteis, conforme informações nos editais em anexo.


Atenciosamente,

https://www.facebook.com/institutobiocienciasusp https://www.facebook.com/institutobiocienciasusp Gestão de Comunicação e Qualidade Instituto de Biociências Ramais +55 11 3091-7515 e 3091-9955 Fax: +55 11 3091-7576 Email: ib@usp.br

http://www.ib.usp.br/