Open happybeing opened 4 years ago
In short, your goal is going to be to remove entirely (of maybe keep it just for testing purpose?) the GetPath()
function in the repositories interfaces.
I see two different kind of usages: the Bleve full text search index, and writing/reading cache files.
Disclaimer: the following is very raw thoughts, the reality is likely going to be different once confronted with the code.
Bleve index:
In https://github.com/MichaelMure/git-bug/blob/master/cache/repo_cache_bug.go#L109-L127, GetPath
is used to instruct Bleve where to write it's index. The goal here is going to add a RepoBleve
interface (similar to RepoKeyring
) to give access to a Bleve index. Once you have that, you can change the actual implementation: the GoGit repo will use an actual on disk Index as currently, while https://github.com/MichaelMure/git-bug/blob/master/repository/mock_repo.go will use the in-memory version of that index (provided by Bleve). I'm not exactly sure what kind of repo you need for your project, but having this interface will allow you to swap actual implementation and tune it the way you want.
cache files:
A bunch of files are read/written on disk by the cache, for example https://github.com/MichaelMure/git-bug/blob/master/cache/repo_cache_bug.go#L68-L71. The goal here is going to change the way the disk is accessed, again by having an additional interface in Repo. Instead of using GetPath()
to get the path on disk and use the normal file access from the standard library, the goal is going to be to provide a go-billy
filesystem for some places (at least .git/git-bug
) and read/write files with that. As before, once you have that you can swap out the actual implementation to suit your need and work only in memory.
Thanks Michael that's very helpful and sounds exactly what I need. For the p2p features I'm going to use a filesystem like interface (so go-git and git-bug would behave as for the CLI reading and writing ./.git
but via a go-billy fs).
I'll also need to try making a git-bug library, so might try that first.
@MichaelMure I've been familiarising myself with the parts of the code at commit (2 Nov, #25b0c71), so just repo cache and not bleve, which accesses the disk. The following is a summary of what is making use of repo.GetPath()
separated into needed now and optional/could be done later:
Places which must be modified for use with either os or go-billy filesystem:
cache/repo_cache_bug.go:
loadBugCache() # uses bugCacheFilePath()
writeBugCache() # uses bugCacheFilePath()
cache/repo_cache_common.go:
RepoCache.GetPath() # Just reflects repo.GetPath()
cache/repo_cache_identity.go:
loadIdentityCache() # uses identityCacheFilePath()
writeIdentityCache() # uses identityCacheFilePath()
cache/repo_cache.go:
repoIsAvailable() # uses repoLockFilePath()
Optional / later - places which can continue calling Repository.GetPath()
because they can assume os filesystem for now at least:
bug/bug_test.go:
TestBugRemove() # Only used with 'file:///' path (for remotes)
cache/repo_cache_test.go:
TestRemove() # Only used with 'file:///' path (for remotes)
input/input.go:
# Invoking an external editor can assume os filesystem is ok
launchEditorWithTemplate()
launchEditor()
repository/gogit_test.go:
TestNewGoGitRepo() # Just compares path strings
repository/repo_gesting.go:
CleanupTestRepos() # calls os.RemoveAll()
repository/git_testing.go:
SetupReposAndRemote()
# Not sure - can we assume selection will only apply for CLI operations (where os storage can be assumed)?
commands/select.go:
selectFilePath() # Saves bug id that is selected (active) for operations
And here are the things that will or may need to be implemented by a new interface, when using go-billy.Filesystem
:
os methods used:
Create()
Open()
OpenFile() used by select.go: Select()
Write()
WriteAll()
# Just for file locking (see cache/repo_cache.go):
Getpid()
IsNotExists()
MkdirAll
Remove()
File methods used:
Write()
WriteString() # file locks only (write PID)
Close()
see also:
gob.NewDecoder(File) # in loadBugCache(), loadIdentityCache() for serialisation
io.LimitReader(File, int) # in cache/repo_cache.go: repoIsAvailable() which handles locks
As I understand your suggestion, this can be done by adding type RepoStorage interface { Storage() GitBugStorage }
to git-bug Repo
, where GitBugStorage
is an interface which specifies the needed os functions.
Then GoGitRepo
will provide an implementation GitBugStorage
which uses a gobilly.Filesystem
, which can be either go-billy osfs or memfs, where operating system storage is used for the git-bug CLI, or in memory (or p2p) storage will be used for an in-browser application such as the p2p git portal.
Is this what you had in mind, sound sensible?
If the above is good, what's the recommended way to implement this in ./repository, should I create a file gogit_storage.go containing a struct GogitStorage
holding the go-billy.Filesystem
object with an implementation of the GitBugStorage
interface?
Pretty much. I looked in more details how GetPath is used:
what is the local git remote URL ?
repoLockFilePath --> lock file, in .git/ bugCacheFilePath --> bug index file, in .git/ searchCacheDirPath --> bleve path, in .git/ identityCacheFilePath --> identity index file, in .git/ selectFilePath --> select file, in .git/
launchEditor --> any temp file launchEditorWithTemplate --> any temp file
os.RemoveAll --> delete repo
So it seems that everything that is needed is an access to $RepoPath/.git/git-bug with a billy.Filesystem
and some functions only used for testing.
I believe the changes needed to the Repo interfaces is as follow:
diff --git a/repository/repo.go b/repository/repo.go
index 4b45a1c5..d34995f9 100644
--- a/repository/repo.go
+++ b/repository/repo.go
@@ -4,6 +4,8 @@ package repository
import (
"errors"
+ "github.com/go-git/go-billy/v5"
+
"github.com/MichaelMure/git-bug/util/lamport"
)
@@ -20,6 +22,7 @@ type Repo interface {
RepoKeyring
RepoCommon
RepoData
+ RepoStorage
}
// ClockedRepo is a Repo that also has Lamport clocks
@@ -48,9 +51,6 @@ type RepoKeyring interface {
// RepoCommon represent the common function the we want all the repo to implement
type RepoCommon interface {
- // GetPath returns the path to the repo.
- GetPath() string
-
// GetUserName returns the name the the user has used to configure git
GetUserName() (string, error)
@@ -64,6 +64,11 @@ type RepoCommon interface {
GetRemotes() (map[string]string, error)
}
+type RepoStorage interface {
+ // LocalStorage return a billy.Filesystem giving access to $RepoPath/.git/git-bug
+ LocalStorage() billy.Filesystem
+}
+
// RepoData give access to the git data storage
type RepoData interface {
// FetchRefs fetch git refs from a remote
@@ -145,4 +150,10 @@ type TestedRepo interface {
type repoTest interface {
// AddRemote add a new remote to the repository
AddRemote(name string, url string) error
+
+ // GetLocalRemote return the URL to use to add this repo as a local remote
+ GetLocalRemote() string
+
+ // EraseFromDisk delete this repository entirely from the disk
+ EraseFromDisk() error
}
Naming the function LocalStorage()
leave some door open to add a GlobalStorage()
or else later if a common storage become needed. defaultKeyring()
use os.UserConfigDir()
which would bypass but that doesn't seem to be a problem for your project (yet?).
Thanks Michael, just skimming as I'm done for today. I'd shy away from using LocalStorage()
being too similar to the localStorage browser API, see https://developer.mozilla.org/en-US/docs/Web/API/Window/localStorage
Will look more tomorrow.
The browser LocalStorage could be a legit implementation of this interface, depending on your usage ;)
It's the same name but a very different context so I think it's fine.
window.localStorage
is a key/value rather than filesystem API so non trivial to implement a fs. We have go-billy and other options in both contexts anyway. BTW I've done something equivalent and it's unpleasant to say the least! I don't see the point in using a name that can confuse unless we have no other decent options.
type RepoStorage interface {
// LocalStorage return a billy.Filesystem giving access to $RepoPath/.git/git-bug
LocalStorage() billy.Filesystem
// ....
GlobalStorage() billy.Filesystem
// ...
TempStorage() billy.Filesystem
}
Don't you think that's reasonable?
The same concept is already used for LocalConfig()
and GlobalConfig()
.
What are the three categories used for?
Only LocalStorage()
would be there for now, but the logical continuation (if we need it) would be to add the next two.
GlobalStorage()
would provide an access to a storage space shared by add the repo (say, the folder given by os.UserConfigDir()
)
TempStorage()
would provide an access to a temporary storage space (say, /tmp/git-bug
).
My point is, LocalStorage()
does remind of window.localstorage
and could be misleading but I'd say that the interface above make sense in the context of git-bug, which is really what matter.
The choice is yours obviously, but I may make suggestions when I understand better 😄.
So your distinction is that LS is a store per repo, whereas GS is more like a PC filesystem holding several repos?
GS is more like a PC filesystem holding several repos
It doesn't matter if this filesystem also hold the actual repos. What matter is that this function return the same shared filesystem when called from any repo.
Ok. I'm not sure, but for my use case GS may be what's needed but I'm still familiarising myself with go-git, which to confuse matters can have both Storage (for git objects) and go-billy.Filesystem (for a working tree) as I expect you know. Storage's everywhere!
I'll probably go to code next and figure out these details that way. You've been very helpful, thanks. So far it looks straightforward except perhaps for the file locking, although I think that should be ok too.
EDIT: no worries for now, but I'm not getting the need for GS, LS and TS. If you need the repos to be on the same storage, you can just give them all the same storage object, if separate you give each one a separate storage object. I don't see why this would be the choice of the code accessing the repo.
no worries for now, but I'm not getting the need for GS, LS and TS.
There is no need for GS and TS ... for now. But I could see usage for them in the future.
An example would be the Keyring. The Keyring is a shared (=global in the git terminology) storage space for credentials. When a bridge is configured in a repo, the credentials become available in every other repo for when you configure your next bridge.
At the moment the Keyring is already swapped out by a memory-only implementation in the mock repo so it's not a problem but if we were to re-implement that, we could use this GS and read/write files there.
TS is just another example, I don't have usage in mind just yet, but a temporary file is useful sometimes.
I think I'll start with Storage()
as I'm not sure yet if it will be local or global and we/you can rename it appropriately later, or when we need you have multiple categories.
@MichaelMure I see there's a 'select' mechanism which writes to disk, presumably so the CLI can have a current issue which is being addressed with multiple commands, so is git-bug writing cache and bleve index to disk for similar CLI convenience, or are there good reasons for wanting them to persist for longer periods (from one day to the next, ideally forever etc)?
This is not an immediate issue, but something for me to think about.
Persist over long period not necessarily but across multiple CLI call, yes.
All those files (cache index, bleve, select) could be deleted without loosing too much. The downside is that everything will be re-indexed with the next call, which can be expensive.
@MichaelMure I'm a month behind master, so maybe some things are in flux but I'm wondering about when git-bug uses git to do operations, and when it uses go-git. Is the code which spawns git commands obsolete or being refactored, or is it still in use and likely to remain so?
A couple of other points:
NewPersistedClock()
to use the billy.Filesystem
and so persist on the device. Just noting this.buildCache()
where you use os.Stderr
with fmt.Fprintln()
and fmt.Fprintf()
which I'll need to modify. I can change them to println()
for now which works with wasm, or can provide a new FprintError()
which will compile to use fmt.Fprintln()
, or when built for wasm to println()
. Let me know if you prefer something else.The old git CLI implementation is still around but it's not used anymore in the master branch. I think I'll keep it around until I'm sure things are really stable.
Which implementation is used is defined in those functions: https://github.com/MichaelMure/git-bug/blob/master/commands/env.go#L49-L137 Each command use one of those functions as a pre-run: https://github.com/MichaelMure/git-bug/blob/master/commands/label_add.go#L15
So in your case we'll have to figure out a way to inject the proper Repo implementation. There is plenty of way to do that, we'll just have to find the less ugly one.
Thanks, that's what I was hoping :smile:. I've been looking at the code you reference just now.
To call NewGoGitRepo()
I see env.go has functions like loadBackendEnsureUser()
which aren't exported (no leading cap) but are obviously called by the commands handler presumably via a struct. The lack of leading capital made it harder to understand the code as I'm using that to signify local functions but I see why that is now, just noting it.
If we decide I should import them elsewhere I'd need to capitalise them, although my first thought is that I wouldn't use your loadRepo()
etc, but do this myself. Not sure yet. Again just noting in case you want to comment.
I'm not sure there's an issue regarding 'injecting' the repos as you're already using gogit.Repository
. I am not planning to change that, but to create the gogit.Repository
to use a billy.Filesystem
, and so would use the existing loadRepo()
function for existing code using g-billy osfs
, and provide a new one which supplies its own billy.Filesystem
for the p2p git portal. I can then modify any code which is currently accessing the os
filesystem calls, and change it to use the repo's billy.Filesystem
to access .git/git-bug
. Or am I missing something? Quite possibly! :smile:
The branch gobilly-test-initfs at commit https://github.com/happybeing/git-bug/commit/25b0c71948fe4bf5b4f4b8ca91ffc4d2f8c47643 is working with the git portal proof-of-concept, branch main at https://github.com/happybeing/p2p-git-portal-poc/commit/9c6f2efebe880cdf899498338bffe3c39627c5fe).
The above allows work to continue in parallel on both the p2p git portal and to integrate these changes with the latest git-bug (including bleve indexing), and provide an improved API to allow use of git-bug with an external billy.Filesystem.
@happybeing:
InitFs..(),
OpenFs..()and
GetGitRepo()` from https://github.com/happybeing/git-bug/commit/25b0c71948fe4bf5b4f4b8ca91ffc4d2f8c47643)@happybeing + @MichaelMure:
git-bug currently uses operating system filesystem calls directly (e.g. https://golang.org/pkg/os/ in git-bug/cache) which prevents it being compiled for Web Assembly.
This task is modify git-bug to use the go-billy filesystem interface, also used by go-git which git-bug relies on for git functionality. Then to add support for wasm builds of git-bug, by building with this fork of go-billy (branch
support-build-wasm
) which has been modified to build successfully withGOARCH=wasm
.