stark-t / PAI

Pollination_Artificial_Intelligence
5 stars 1 forks source link

Prepare download & image preparation script for the table with image URLs #71

Closed valentinitnelav closed 1 year ago

valentinitnelav commented 1 year ago
valentinitnelav commented 1 year ago

Add metadata file about the data table (txt files with annotation and url information)

valentinitnelav commented 1 year ago

Ok, the final decision is that it is safer not to publish even the URLs because of possible legal implications that fall outside of the German jurisdiction. This makes the entire process less reproducible but we can provide the datasets that i prepared under individual request. I'll move that under a NextCloud link

valentinitnelav commented 1 year ago

I tried to clean the history like this (on my Ubuntu), but it didn't work:

I use the tool BFG Repo-Cleaner recommended by GitHub at https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository

Follow instructions at: https://rtyley.github.io/bfg-repo-cleaner/

Installation of bfg-repo-cleaner Download the jar file from https://repo1.maven.org/maven2/com/madgag/bfg/1.14.0/bfg-1.14.0.jar then it can be executed directly in the terminal because I had JAVA installed.

If not already installed, then install the default Java Runtime Environment (JRE) on the system

sudo apt-get install default-jre
cd /home/vs66tavy/Nextcloud/paper_01/git_archive
git clone --mirror https://github.com/stark-t/PAI.git

# This didn't work in one go, I had to break it down:
# java -jar bfg-1.14.0.jar --delete-files img_annotation.txt img_url.txt syrphid_img_annotation.txt syrphid_img_url.txt PAI.git

java -jar bfg-1.14.0.jar --delete-files img_annotation.txt PAI.git
java -jar bfg-1.14.0.jar --delete-files img_url.txt PAI.git
java -jar bfg-1.14.0.jar --delete-files syrphid_img_annotation.txt PAI.git
java -jar bfg-1.14.0.jar --delete-files syrphid_img_url.txt PAI.git

cd PAI.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive

# Enumerating objects: 894, done.
# Counting objects: 100% (894/894), done.
# Delta compression using up to 8 threads
# Compressing objects: 100% (885/885), done.
# Writing objects: 100% (894/894), done.
# Selecting bitmap commits: 131, done.
# Building bitmaps: 100% (105/105), done.
# Total 894 (delta 563), reused 302 (delta 0), pack-reused 0

git push

# But I got this error:

Username for 'https://github.com': valentinitnelav
Password for 'https://valentinitnelav@github.com': 
Enumerating objects: 155, done.
Counting objects: 100% (24/24), done.
Writing objects: 100% (155/155), 94.35 MiB | 19.88 MiB/s, done.
Total 155 (delta 24), reused 24 (delta 24), pack-reused 131
remote: Resolving deltas: 100% (97/97), completed with 6 local objects.
To https://github.com/stark-t/PAI.git
 + 9869d65...2ac8a7b main -> main (forced update)
 ! [remote rejected] refs/pull/1/head -> refs/pull/1/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/10/head -> refs/pull/10/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/11/head -> refs/pull/11/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/12/head -> refs/pull/12/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/13/head -> refs/pull/13/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/14/head -> refs/pull/14/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/15/head -> refs/pull/15/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/2/head -> refs/pull/2/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/3/head -> refs/pull/3/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/4/head -> refs/pull/4/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/5/head -> refs/pull/5/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/6/head -> refs/pull/6/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/7/head -> refs/pull/7/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/8/head -> refs/pull/8/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/9/head -> refs/pull/9/head (deny updating a hidden ref)
error: failed to push some refs to 'https://github.com/stark-t/PAI.git'

git push --force gives the same error.

valentinitnelav commented 1 year ago

Sadly, I am out of ideas. I think we just need to have a fresh repository :/ The link to it needs to be updated in the manuscript as well.

Another idea, i can send you the pruned history of PAI.git that I have and push that to a new repository, or simply when you create a new repo, delete the hidden .git folder from your local repo and initiate a new git repo from scratch. This will delete all history/all commits (safest approach).

@stark-t if you make a new repository, then here is some inspiration for the name:

valentinitnelav commented 1 year ago

Ok, after another consideration and double checking each type of license that appears in our dataset, I would say we are ok with publishing the URLs:

#                                                license
#   1:                                      CC BY-ND 4.0
#   2: http://creativecommons.org/licenses/by-nc-nd/4.0/
#   3: http://creativecommons.org/licenses/by-nc-sa/4.0/
#   4:    http://creativecommons.org/licenses/by-nc/4.0/
#   5:    http://creativecommons.org/licenses/by-nd/4.0/
#   6:    http://creativecommons.org/licenses/by-sa/4.0/
#   7:       http://creativecommons.org/licenses/by/4.0/
#   8: http://creativecommons.org/publicdomain/zero/1.0/

I will have another exchange with the lawyer company regarding other questions and once everything is clear, I put back the tables with URLs and the scripts that downloads and prepares the image datasets.