r-lib / xml2

Bindings to libxml2
https://xml2.r-lib.org/
Other
218 stars 82 forks source link

Migration guide for XML package #246

Open nuest opened 5 years ago

nuest commented 5 years ago

The package XML is orphaned on CRAN. There are still updates from time to time, but I am unsure whether my current problem, where I get a segfault the moment that Rccp is loaded, will be fixed.

Is there a migration guide for users to switch to xml2 from XML or are you aware of any packages that did this?

jennybc commented 5 years ago

There is no such migration guide within the xml2 docs. There are definitely packages that have done this, but I can't think of any off the top of my head. You might be able to turn some up with clever GitHub searches. But it really sounds like a good topic for https://community.rstudio.com or even an #rstats tweet.

nuest commented 5 years ago

Thanks for the ideas, I went ahead with one and two. Would you mind leaving this issue open for a little bit, potentially collect some more information?

The migration will be a big step for my package, so I'll need a bit to schedule that, and it'll probably be so much work that a couple of hours to write down lessons learned are not a big addition. Still I'd hope to find some other contributors to such a migration guide and collect as much information here as possible.

Process/Notes

jennybc commented 5 years ago

Here's are some of the main PRs where the switch happened in googlesheets:

nuest commented 5 years ago

I just published my notes on the transition to xml2 in this gist: https://gist.github.com/nuest/3ed3b0057713eb4f4d75d11bb62f2d66

The source code changes are best seen via the commits mentioning the issue https://github.com/52North/sos4R/issues/42

The notes include a table listing the related functions, and things I could "automate" with regexes, and changes I did manually. It's pretty raw, not a friendly "guide" yet. I'm open for feedback and suggestions how to proceed here. Looking at the large number of packages depending on XML, an orphaned package, it might be worth reaching out to some of them and advertise xml2.

georgevbsantiago commented 5 years ago

I currently use the XML package only because of the readHTMLTable function. The xml2 package does not have a function to read tables in the HTML files, correct? I've already tried using the rvest :: html_table function, but readHTMLTable is 10x faster and produces a cleaner data table.

I put an HTML example in the attached file. The table "id =' tableResult" for test

html Table.zip

jimhester commented 5 years ago

I think a useful TDD task would be to take @nuest's guide and turning parts of it into a vignette / pkgdown article.

georgevbsantiago commented 3 years ago

Atualmente, uso o pacote XML apenas por causa da readHTMLTablefunção. O pacote xml2 não tem função de leitura de tabelas nos arquivos HTML, correto? Já tentei usar a função rvest :: html_table, mas readHTMLTableé 10x mais rápida e produz uma tabela de dados mais limpa.

Coloquei um exemplo de HTML no arquivo anexado. A tabela "id = 'tableResult" para teste

html Table.zip

@hadley made a spectacular improvement in the performance of the rvest::html_table function!!! Aqui

You may want to include the rvest::html_table function in the tutorial Switching from XML to xml2