ssc-oscar / gather

For harvesting latest repos
10 stars 9 forks source link

run.sh:get list of git repos #5

Closed pidanself closed 5 years ago

pidanself commented 5 years ago

Hi, Professor. I'm one of Professor Zhou's students. I'm trying to add forges to run.sh to get list of git repos from https://cgit.drupalcode.org/. I have one question about it. My understanding about our goal is to get URL of all Tags of https://cgit.drupalcode.org/ and the URL could be used to git clone. Is my understanding correct? Thanks for your kind support.

audrism commented 5 years ago

Exactly!

pidanself commented 5 years ago

Thanks!I will try it~

pidanself commented 5 years ago

Hi, Professor.
I have analyzed that web and found there is no direct urls of all tags for us to git clone.
But we can use this command "git clone -b 1.0 https://git.drupalcode.org/project/drupal.git ./1.0" to get tag named 1.0 in repository to store it.
So I think we could store url like this "-b 1.0 https://git.drupalcode.org/project/drupal.git ./1.0". "1.0" represents one of all tags in https://git.drupalcode.org/project/drupal.git. "./1.0" represents corresponding local storage path.
And I try it in shell like this: url="-b 1.0 https://git.drupalcode.org/project/drupal.git ./1.0" git clone $url It will correctly get the tag 1.0 of https://git.drupalcode.org/project/drupal.git. Is it ok? If it's ok, I will try
to write codes to get urls of all tags like that. Thanks for your kind support.

audrism commented 5 years ago

Ok, I misunderstood your first email. I need all the git repos. It is trivial to get all tags for a single repo.

See: https://www.drupal.org/contribute/development

"The Drupal code ecosystem encompasses the core of Drupal (the files that you get when you download Drupal from the Drupal project page), and "contrib" projects, which encompass all contributed code (modules, themes, installation profiles, etc.). You can read more about this distinction on the Core vs. contributed projects page. You can also improve both Drupal core and contributed projects by submitting patches."

Each project, in my understanding, has a separate git repo: is that correct? If so, I'd like to obtain urls for all these git repositories.

pidanself commented 5 years ago

Get it!
I will try it.
Thanks for your kind support.

pidanself commented 5 years ago

Hi, Professor. I have found that all git repositories we need are all in https://git.drupalcode.org/explore/projects. I check it with https://www.drupal.org/project/drupal. And all projects are stored there. That web has 2398 pages and every page has 20 projects. So I wrote one shell script to get all urls of these git repositories. My code is here https://github.com/pidanself/gather_test. And test.sh is code we use to get urls and all urls are stored in drupal.com. Our code get web pages that are sorted by name and show archived projects. Code will sleep 2 seconds every 10 times in case it gets blocked. I has tried to get all urls of 154 pages. And I git clone some urls, they are all successfull. Is that ok?

audrism commented 5 years ago

Great, can you add the code directly to run.sh in a PR?

pidanself commented 5 years ago

Ok, I will add the code to run.sh.

audrism commented 5 years ago
  1. Can you make the code go until the last page (i.e., as the number of projects increases, the fixed number will not be enough)?

  2. What kind of next task would you like to do?

pidanself commented 5 years ago

Hi, Professor. I have created a pull request to add my code. And it can go until the last page.

audrism commented 5 years ago

Than you!