rstudio / blogdown

Create Blogs and Websites with R Markdown
https://pkgs.rstudio.com/blogdown/
1.73k stars 335 forks source link

clean_site() catastrophically deleted many files it should not have #518

Closed joftius closed 3 years ago

joftius commented 3 years ago

Under the Build tab I clicked on More and then Clean All. It proceeded to delete my entire site, including the git repository and even other files that were in the same folder.

I have a folder named "source" that contains all the pre-build files, and I build to a different folder located at ../githubname/ from there. Clean All deleted the githubname folder completely, including a lot of other important files like various course notes that were not part of the bookdown output and were not added to my github repository, so I (apparently) have no way of recovering them now. The working directory, in the console at least, was the source folder, so I don't know why it would have deleted the contents of a different folder.

I may have lost a lot of valuable work permanently and I hope to alert the project maintainers to prevent this happening to others.

I can't give a reprex because I can't recover the state before the issue occurred.

joftius commented 3 years ago

Looking at the source of clean_site() I can see the reason for this now. The site generator$clean() function includes output_dir. So anyone who, like me, compiles their site to a (git repo) folder that includes other files which are not part of the blogdown project is in danger of having those files wiped out by this handy little "clean" button.

cderv commented 3 years ago

Hi,

Thanks for the report. That is indeed unfortunate. Can you share you project folders structure so that we understand for sure how to prevent this case ?

My understanding is that cleaning a blogdown website project will remove

Is the issue because you set publishDir to a directory which contains other things that the hugo website ? This dir correspond to what Hugo expects:

publishDir (“public”) The directory to where Hugo will write the final static site (the HTML files etc.). blogdown consider that this directory is safe to remove when you want to clean the website, as it is the result of hugo build from the source file.

Which of these exactly where wrongly deleted ? I am trying to understand how your organized your project so that we do not misunderstand the issue.

Thanks !

yihui commented 3 years ago

@joftius I'm extremely sorry about what happened! We should have been much more conservative in terms of cleaning sites, since this operation cannot be undone. I feel the desirable behavior of the Clean All button is to not actually delete the files and folders, but only tell users which files/folders are to be deleted. If users really want to delete them, they must use the command line, e.g., type something like rmarkdown::clean_site(force = TRUE) (by default, force = FALSE) in the R console.

joftius commented 3 years ago

I found a backup of most of my files on a different computer so all is not lost for me.

@cderv and @yihui: If I had read "blogdown consider that this directory is safe to remove when you want to clean the website" then I should have known not to structure my folders that way. I don't see how else to host a website using a git repository, e.g. on github sites, where I want to also keep some files that are not all managed through blogdown.

cderv commented 3 years ago

I agree, we should be more conservative. @yihui, the function is rmarkdown::clean_site(preview = FALSE) as the default (not force). The IDE is indeed calling rmarkdown::clean_site(). Maybe it should call preview = TRUE or we could set the default to TRUE in rmarkdown::clean_site()

@joftius about hosting hugo website, you have several examples in the community. One most use service blogdown works well with is Netlify. With this service, you don't need to build the website, you just publish the source on Github and netlify will build the website using hugo for you. But Github pages works also fine, in a setup where you publish to gh-pages branch or host from a folder within your repo (doc or public)

where I want to also keep some files that are not all managed through blogdown.

I wanted to know more because of this - are you talking of keeping them in the git repo or in the website also ?

This may be some specific usage we need to better support.

joftius commented 3 years ago

@cderv Since I use github pages my website is a github repo. I build locally on my machine and then push the repo to publish changes. So for example my website is hosted at http://joftius.github.io but I also host other content on the same repo like slides I might need to link to when giving a talk, or a syllabus for a course, etc.

cderv commented 3 years ago

Oh I see. You are using a User site - the github repo is the website. You are not pushing the source files to github, but you use blogdown to render in this "website project" directory where live other contents to.

The other way to manage is to have a project for your hugo/blogdown website, keep the source in that project, and build the book into gh-pages, or into the public or docs folder that is served using github pages (choosing a source for github pages)

FYi regarding hugo website, when keeping the source of your website on github, I believe those "other files" can live in static/ folder: https://bookdown.org/yihui/blogdown/static-files.html They will be moved by hugo to the publishDir when building.

Anyway, I don't think you should change anything to your workflow. I think we should look into the publish directory when cleaning and check if it is a git repo (.git folder exists). If so we don't remove the publish dir but just warn about it. That would prevent those problems, and be more supportive of User and Organisation website using Github Pages. We should also default to preview = FALSE.

cderv commented 3 years ago

I have given some more thoughts about that after discussing with @apreshill

Currently, clicking the Clean All button in the IDE will

One easy solution:

_book/

You can set options(bookdown.clean_book = TRUE) to allow this function to always clean up the book directory for you.


Let's note that in **bookdown**, Clean All button will call `clean_book(clean = FALSE)` **BUT** will remove anyway because that is what `rmarkdown::render_site()` is doing by default.

Another solution: 
* Changing the behavior of the button to call directly a function to clean in the site_generator, and that would be the package author to decide to delete or not. 

This would require change in the IDE I think so less easy... it is a long term strategy to discuss. 

It is easier to make `clean_site()` more chatty as it is in our hands in the **rmarkdown** package
yihui commented 3 years ago

I'd very much like to make rmarkdown::clean_site(preview = TRUE) the default. Deleting (potentially) a large number of files and dirs is a dangerous operation, especially when it can't be undone.

With preview = TRUE, we also need a message to tell users what to do if they are sure they want to delete the files (i.e., call rmarkdown::clean_site(preview = FALSE) in the R console).

Feel free to send a PR to rmarkdown.

For bookdown, the clean function in it's site generator should only return a vector of paths, and let rmarkdown::clean_site() decide what to do, instead of having its own clean = FALSE argument.