Closed makkus closed 13 years ago
I like option 2 and I think grid://xyz makes more sense than gsiftp://xyz to the average user. I think a mounted path is probably going to be confusing if a user is working from their own machine.
Are there any other logical options, other than grid or gsiftp? These seem quite arcane, though I guess training on their meaning would be possible. Do they mean "remote file"? is there a more common language way of expressing this? Perhaps "remote"
In practice, yes, they mean remote. Since all those files are in grid-space, I decided to call the protocol part of the virtual filesystem "grid". Plus, it's short. I would think that users wouldn't normally use gridftp urls directly (underneath all grid:// urls resolve to gsiftp urls), but it will be useful in some cases, so I wouldn't recommend hiding that totally. gsiftp urls tend to get quite long, though...
So, do you suggest to use urls like: remote://groups/ARCS/BeSTGRID/whatever remote://jobs/jobname/stdout.txt ?
Looks reasonable to me :)
Does this mean that there is a "local" that is the default, implied when accessing local files? And if so, we should implement "local" too, for consistency?
I still prefer grid://, since it's shorter and as accurate as remote, it also describes somewhat better that this points to a virtual filesystem (which under the hood is fueled by the grid information system) which wraps grid filesystems and makes them more accessible, whereas remote could be anything. For example, we could also have http or ftp urls to use as job inputs for the attach command. Those files would also be remote.
Also, having "local://" or "file://" as an option to specify local files wouldn't hurt (except make the code more complex and might confuse people), but I'd think using the default way of specifying local paths makes more sense. For one, it's shorter. Also, you can copy and past paths from another terminal without having to prepend "local://". Just more practical. Also, gricli users would know that kind of specifying paths, since they already know it from other cli's and those paths describe the same files as they do in the rest of the shell, so why have different ways of pointing to them?
Agree on the local.
Re grid, I don't mind shorter, but the justification as I noted earlier is reasonably arcane. I'd assume users would be able to deal with this though. I guess I was just checking to see whether there mightn't be more obvious options.
Re grid, I don't mind shorter, but the justification as I noted earlier is reasonably arcane. I'd assume users would be able to deal with this though. I guess I was just checking to see whether there mightn't be more obvious options.
Fair enough, I am not 100% happy myself. Would need to be something that describes it is a way to access "all the users files that are managed within NeSI space".
Thought about compute:// and data:// , but those are not that good either.
What about: workspace:// ? Hm, not really, I guess...
nesi:// ? Would get the branding out, probably not so good for Grisu as an open source project...
This is an interesting way to think about this..
If gricli is aware of VOs, and is deployed in a multi VO environment, overlaying multiple infrastructures e.g. ARCS, NeSI, and TeraGrid, would "grid", or any single name, provide enough detail to resolve a virtual file space? Perhaps this could be configurable, or derivable, from an appropriately identifiable entity at the top of the hierarchy of any specific virtual home?
Not quite sure whether I understand what you are saying, but I think the virtual filesystem itself resolves some of those points, at least if you browse the "group" filesystem:
grid://groups/ARCS/BeSTGRID grid://groups/nz/uoa grid://groups/nz/nesi
the next tokens would always be the resources (file-resources) that are accessible via those VOs:
grid://groups/nz/nesi/markus.binsteiner (which means datafabric)
If we'd support TeraGrid and also VOs that can access resources there, then we could have something like: grid://sites/teragridSite/home/teraGridMappedUser/file or grid://groups/tg/tgproject1/file
Or, if TeraGrid would support the nesi vo for example, it could be acccessed like: grid://groups/nz/nesi/tguserHomeDirName/file.txt
All of those are backend configurations (and of course the permissions/mappings have to be set on each participating site) along with putting all this into mds.
So, looking at it from this angle, I think grid is a reasonable good term to describe the filespace we are talking about. It is a grid after all, at the moment we are dealing with a grid on a national level mainly, but that is easily (well...) extended...
pondering whether, given what you note above, whether the file systems might be:
groups://nz/nesi/ sites://tg/sdsc/ sites://nesi/uoa/
what other qualifiers on grid virtual file systems are there?
Probably possible, but quite a bit of work to re-write the virtual filesystem code...
as in, commons-vfs.. or grisu backend code?
grisu backend and frontend code. Since virtual filesystem (talking about ours, not commons-vfs) is so different to normal filesystem there are a lot of checks throughout the code to get different behaviour depending on whether it is virtual or not. Also, internally the virtual filesystem is treated as one filesystem that uses a plugin system to redirect to the "sub-filesystems", I'd have to rewrite this code...
ok - the refactoring seems worthy of careful discussion.
I'm wondering though, independent of such a refactoring, should we go ahead with the virtual FS as is, hence grid://? Would going ahead as is add to refactoring latter, or not really?
To be honest, I think it's probably good enough. I'm sure we could could come up with something that's a bit better and intuitive, but I'm not sure whether it'd be worth the effort. If we realize people are really struggling with the concept we can always re-think. Although, in that case I guess we'd have to change more than just moving groups one token to the left...
The grid:// thing is only part of this issue (and to me not the more important one). The other part is whether we have let users operate using only one command (cp) and mix filespaces (local and grid) or whether we have a command for each operation (upload, download, copy remote files, lcd, gcd, lls gls).
I think it would be nice to move through the file system with one set of commands eg:
cd /home/whoami/myfiles cd grid://nz/nesi/myfiles
I don't think having a protocol is a problem, people deal with http:// all the time
Good. Will do it like that then.
All right:
/opt/griclish-dev/griclish-dev -b dev
gives you an improved ls and attach command with file tab-completion for local and remote files. I think it's really sweet and should ease using especially the attach command.
Re: grid://
protocol. Have a look how it feels for you, but I think with this tab-completion thing it feels natural enough to use.
For ls
: should it also display last modified and file size or not? Might make sense if one ls's a single file. Although, for some virtual filesystems that might not be possible in any case (because of the way it is implemented -- otherwise file-listings would take far longer). In those cases we could just display "n/a" or so.
Opinions?
Next I'll implement cp
command...
ls
: I'd suggest this should be consistent with standard shell implementations. So, if we want full details, ls -l
would be normal? I'd also, for the same reason, suggest hidden files should equally be hidden for the default behaviour of ls
when I'm in my home directory and have my group set to /nz/nesi/
I had assumed an ls
would return my grid file systems as well.
how / where do I discover my grid file systems?
i can see now that I use gls
to list grid file systems, which i discovered by doing a help gls
and seeing the examples.
my first attempt to do this was by reviewing and updating the global
group
property to the appropriate group, and then running ls
. This seemed logical, though on reflection probably isn't. The single presentation of filesystem spaces within grisu client implementations works because the entire hierarchy is visible at the time, and multiple roots are retained to anchor each subtree, whereas in changing my directory on the client it wouldn't be clear to me which directory the group directories should show up within. Though to be consistent between the two it might be appropriate to have a root that does contain also the listings of grid directories for the currently set group
.
Hm, not sure I understand.
Firstly: I'd like to get rid of gls
alltogether and only use ls
anymore. You can use ls grid://
(that's why I was asking about how to name the protocol) to get a list of all your virtual "grid roots". As I said, for ls
tab completion for remote directories works now as it does for local ones, which should make the overall use more efficient and easy. Tab completion would give back the grid://
string as option, independent of the dir you currently are. ls
won't list it though, since it's not a child of the current (or other) directory you try to list. That is why I asked (above) whether we should "mount" the grid somewhere into the local file hierarchy. I think we decided not to, but I might be mistaken...
Then, I don't think ls
should change it's output depending on which group is set. At the moment, the globals really only affect the (next) job that is about to be submitted, and that is all they are needed for. Checking job statuses, ls
'ing and downloading files and all that don't need context and requiring one would make it more complex for the user (at least 2 commands instead of 1 to list a directory on the grid).
Changing directories is different, and we haven't really started to think about it. If we only allow changing directories locally, there should not be major problems. If we allow also changing into remote directories cd grid://groups/nz/nesi/
there might be implications we don't see at the moment. Maybe not, but I don't know. Worth talking about it.
Re: parameters like ls -l
. I don't think it'd be easy to do something like that. For one, bash ls
is designed to support being used in bash scripts with all the piping that makes bash so useful. We don't support that (and it'd be hard to start now because of some decisions we made early on) so some of those options wouldn't make so much sense within gricli.
Then, which of the 30-odd parameters of the bash ls
command would we support? People might get confused and think the actually use the bash ls
and not our gricli one.
Now, all that would probably be possible still (even if it would take quite a bit of time to implement), but the bigger obstacle is that we don't have -l
type parameters in gricli at all. I don't think it'd be wise to have them for only one command, that'd break the user experience, in my opinion, and it'd be out of scope. Like Yuriy said a while ago, gricli is only designed to be a fairly simple CLI shell and it can't support too complex/extensive usecases. If we wanted it to be that, we would have had to make a few different design decisions at an earlier stage...
So, after Nicks questions/comments I'm a bit unsure again how we should proceed here. Should I continue and add grid://
support to gricli or is the plan a different one? I don't mind either way, but I'm having difficulties seeing what the best way forward would be, in terms of ease of use. Don't trust my view that much anymore, it might be clouded because of what I think is possible and how the Template client works and how I use a commandline interface (using tab-completion extensively). Our "normal" user might find that terrible...
I can run the options past our users to get some feedback on the direction we should take here?
thanks Markus, there's some good coherent insights into how you're thinking packed in there. I hadn't been able to discover the ls grid://
from looking at help ls
, so perhaps clarifying the help will... help. I hadn't attempted tab completion, and agree that this is a good thing to rely on, although as it's only been partially implemented in the past, I haven't gotten into the habit.
I don't think we need to have cd on remote directories right now, as login is primarily a job submission host and its mostly the local directory that needs to persist when changed, I would assume. Open to further comment on this though.
Additional params for ls
, and your comments on designing for script based use, remind me that we had pondered going down that path, recently, so I'm interested to understand the design implications for doing so, though not here and now :).
@smas036 - re users, yea, think there's some good things to test:
ls
- do they describe behaviours intuitively?ls
- what should the default implementation look and feel like, and how about command extensions e.g. ls -l
- what must we have, as a minimum set or a default behaviourQuick updated:
Had a discussion with Yuriy and he thinks that "cp" is not necessary for gricli, since we have other tools for this. Not sure I agree 100% but I wouldn't mind Yuriy being right on this one, since implementing "cp" is a bit more work than I anticipated, given that we would need to be able to copy from/to all the virtual filesystems, and there lies quite a bit of complexity that wasn't obvious before I started. Here be dragons. I got it working for some cases (for example up-/download) gsiftp->gsiftp, but not for others (group-fs->job-fs for example, but there is more).
What is your guys opinion on this? Worth the effort and complexity of code? Or not?
What are the other tools? SCP etc?
SCP, ueberftp, globus online. Not sure what else...
I think I should use a user mailing list to field these sorts of questions and get some feedback, would be an easy way to keep them continuously updated with our progress and help us with bug fixes and priorities. It may be they case that they are happy with these other tools... or not. If you guys agree, I'll get started as soon as we have the appropriate list?
Yep, sounds like a good idea.
Should we remove gls now? Is ls good enough now?
Yup, I'll make it clear in the help that you can use it for grid locations
Ok, then I'll remove 'gls' from gricli commands. And close this ticket.
If we decide to implement 'cp' command, we can create new issue for that.
Ok, re: cp, just heard from a user today about their preference for cyberduck, doesn't even use download or archive commands. I'm aiming to hear from more users over the next couple of weeks.
Just had an idea, I think Sina or somebody else mentioned whether we should have something similar to "~" symbol for grid.
How about we use "#"? As additional shortcut? The character looks like a grid anyway, and if I'm not mistaken, it's not really used for anything else in urls...
It would point to the users "default grid filespace", something akin to a remote home dir. At the moment it would point to the DF home of the user (so it would be a shortcut for grid://groups/nz/nesi). Might be a nicer way to attach input files:
attach #/blastjobInput/file1.txt attach #/archived-jobs/old_job_25/result_from_other_job.file
Re: cyberduck. That is fine as long as users can access the filespace via ssh. I don't think it supports gridftp though. Also, no tool would support our virtual filesystem. If a user knows that the file he wants is under grid://groups/nz/drug_disovery/folder/file, how would he know where to connect to, and whether that particular filesystem is published via ssh?
Also, it can't be used in gricli scripts, which, sooner or later, people might want to do...
Closing this now.
I think grid:// support, improved tab-completion and ls command do actually make file system access less pain. Still room for improvements, but for that we should open new, more specific issues I guess...
I think we should have a little discussion on how we want gricli to present files/filesystems.
The way I see it, there are 2 options:
1) we do it like ftp clients, distinguishing between local filesystem and remote one. Each filesystem gets its own set of commands. I.e. lcd / cd (if I remember right, been a while since I used ftp)
2) we treat everything as one filespace, only the protocol of the urls is different between local and remote. I.e.
for local we could have those options:
file://home/markus/file.txt or local://home/markus/file.txt and also, as an optional short option /home/markus/file.txt
and for remote
we would have absolute urls like gsiftp://ng2.auckland.ac.nz/home/markus/file.txt
as well as virtual ones: grid://groups/nz/nesi/markus.binsteiner/file.txt grid://jobs/grisu_job/file.txt grid://sites/auckland/ng2.auckland.ac.nz/home/markus/file.txt
I would opt for version 2 since I think it'd be easier to understand for users:
Disadvantage of this:
We could also "mount" the grid onto the proper filesystem paths, something like:
/grid/groups/nz/nesi/myfile.txt
But that might be more confusing since that path would disappear as soon as people don't use gricli.
Opinions?