hraban / tomono

Multi- To Mono-repository merge
https://tomono.0brg.net
GNU Affero General Public License v3.0
842 stars 138 forks source link

License missing #30

Closed sebastianludwig closed 2 years ago

sebastianludwig commented 5 years ago

We'd like to build upon your work. However this repository is missing a license. Could you please add one?

pcentgraf commented 3 years ago

I'd like to repeat the request for a formal license on this project. I am willing to contribute some substantial improvements, but the murky license status is making that difficult. It appears that Ravelin has transferred control of the repository to you. Can we have them release the copyright to you also? Then you can go ahead with AGPLv3 or whatever you prefer.

Improvements I could contribute under AGPL:

hraban commented 3 years ago

hi pcentgraf,

Thanks for your comment and interest. I'm on and off rewriting this whole project myself so I can have full copyright and release under AGPL.

Unfortunately, licensing is up to copyright holders, not to whomever controls the project. Fact remains that while Ravelin transferred ownership of the repo to me, they didn't transfer the copyright to me, and they didn't want to release the code under their ownership under AGPL.

It's good to know there is still interest in the project, I will spend more time looking at it over the coming weeks to finalize the rewrite.

On the plus side, there is an easier way than mucking with disabling GC to fix the dangling objects problems :) by using the working directory to do merging, instead of the index. It was the first iteration of the project, originally abandoned for speed in a classic case of premature optimisation.

I'll ping you on this issue when the rewrite is complete and released under AGPL.

Cheers

Hraban

pcentgraf commented 3 years ago

I found that working exclusively with the index, I was able to improve performance by more than an order of magnitude. Working with git's read-tree, commit-tree, and update-ref commands, I was able to avoid using the workspace entirely, which saves enormous amounts of redundant disk activity. The potential loss of optimization by disabling packing is greatly overshadowed by avoiding writes to the entire workspace, at least in my case. (~90 mins to ~7 mins.)

For my use case, we also didn't care about preserving the hashes of historical commits, so I chose to use 'git filter-repo --subdirectory-filter' on each input repository, before merging. (Filter-repo is also really fast. ~3 mins to rewrite all of our input repos.) This simplified the index operations somewhat during merging. The result is a cleaner history that doesn't rely on git's ability to track renames, which doesn't work well for large repositories. Obviously, the tradeoff is that the historical hashes aren't preserved.

With these two techniques, plus disabling gc.auto, I was able to merge 70+ repositories for a resulting monorepo with ~450mb of packed content in about 10 mins, working with all data on a MacBook Pro 2017's local SSD.

I'm happy to share my diff with you directly, if you're already working on a rewrite. Bloomreach (my employer) has no concern about releasing claims on my work under whatever license you prefer. We just don't want to be accused of redistributing code to which we don't have a license.

hraban commented 2 years ago

For anyone still interested in this: cleanroom rewrite complete, licensed under AGPLv3, no work tree only index. 👍 better late than never :) I consider this issue closed.

hraban commented 2 years ago

thanks for your consideration and effort, all!