Open darren1988 opened 3 months ago
requires defined architecture before planning
Meeting scheduled for 29/08/24 to discuss scope and technical architecture for this work
I have the service account credentials from Gwion and have put them into 1Password OPG - AWS DataSync Service Account
AP Shared Account
To be discussed at refinement.
Reached out to @ministryofjustice/modernisation-platform about adding their shared platform VPC into our ingestion account
Have agreed with @ministryofjustice/modernisation-platform that this isn't a problem, we can add shared VPC, will inspect environment code in modernisation-platform and modernisation-platform-environments
shared VPC added to ingestion account, however upon further reading Data Sync does not support shared VPCs
Plan is to create VPCs using existing, soon to be retired, never connected to MoJ TGW, ranges from MP
VPC build-out in progress, EC2 instance build-out also in progress.
However the DataSync registration is not programatic, the DataSync server needs to be accessible from whatever machine is running Terraform/registering manually in the console. This is problematic because
1) GitHub Actions is our primary CI/CD system that has no internal connectivity to our VPC and isn't really in scope for making work because its MP's system
1) Our VPC currently has no public connectivity
1) From what I've read, the AWS provided AMI does not include SSM agent, instead need you need to connect via SSH (I tried serial console but the default of admin
/ password
didn't work)
1) The above presents a chicken/egg problem 🤔
Do I open the endpoint to GitHub Actions? GlobalProtect?
Do I add userdata to install SSM agent and write the activation key to Secrets Manager? I don't even know if the activation key is held on disk or if I'd to run a command...
10/10/24 update:
private_link_endpoint
16/10/24 update:
TODO:
Currently blocked by https://github.com/ministryofjustice/modernisation-platform/issues/8275
Requested support from mod platform to help unblock this ticket
NVVS/LAN&Wifi team have given me access to https://github.com/ministryofjustice/deployment-tgw, so I'm not as blocked as last week 🙏
Moving back to blocked pending information on connecting to DOM1 from AWS
Blocked while https://github.com/ministryofjustice/modernisation-platform/pull/8322 is deployed
31/10/24 update (spooky edition 🎃):
[root@ip-10-26-128-43 bin]# dig dom1.infra.int
;; communications error to 10.26.128.2#53: timed out
;; communications error to 10.26.128.2#53: timed out
;; communications error to 10.26.128.2#53: timed out
⬆ pull request added staff device production vpc to mod platform tgw route table, still not responding...
traffic is arriving in destination vpc, but return path from ark might not be working, need to reach out further
This pull request (https://github.com/ministryofjustice/deployment-tgw/pull/258) has allowed us to lookup dom1.infra.int via MoJO DNS resolver
Blocked again as we liaise with other parties about direct connection into Ark over TGW.
It can route back, we just cant route there, presumably because its blocked by a Palo Alto
🎉 I am able to connect to DOM1 from my debugging instance! 🎉
Reached out to ATOS because I can't access one of the locations
Have reached out to @gwionap for clarification on source data
Updated locations received from @gwionap, will continue.
Have created a task but is failing...
I can't explore this location with smbclient from the debug instance either
smb: \> ls hq/PGO/Shared/Group
do_connect: Connection to eucw4171nas002.dom1.infra.int failed (Error NT_STATUS_IO_TIMEOUT)
Unable to follow dfs referral [\eucw4171nas002.dom1.infra.int\mojshared002$]
do_list: [\hq\PGO\Shared\Group] NT_STATUS_IO_TIMEOUT
have escalated to @gwionap
A more verbose output from smbclient
smb: \> ls hq/PGO/Shared/Group/
dos_clean_name [\hq\PGO\Shared\Group\]
unix_clean_name [\hq\PGO\Shared\Group\]
signed SMB2 message (sign_algo_id=1)
signed SMB2 message (sign_algo_id=1)
sitename_fetch: No stored sitename for realm ''
namecache_fetch: no entry for eucw4171nas002.dom1.infra.int#20 found.
resolve_hosts: Attempting host lookup for name eucw4171nas002.dom1.infra.int<0x20>
namecache_store: storing 1 address for eucw4171nas002.dom1.infra.int#20: 10.172.69.24
Connecting to 10.172.69.24 at port 445
convert_string_handle: E2BIG: convert_string(UTF-8,CP850): srclen=30 destlen=16 error: No more room
Connecting to 10.172.69.24 at port 139
do_connect: Connection to eucw4171nas002.dom1.infra.int failed (Error NT_STATUS_IO_TIMEOUT)
Unable to follow dfs referral [\eucw4171nas002.dom1.infra.int\mojshared002$]
do_list: [\hq\PGO\Shared\Group\] NT_STATUS_IO_TIMEOUT
signed SMB2 message (sign_algo_id=1)
signed SMB2 message (sign_algo_id=1)
signed SMB2 message (sign_algo_id=1)
signed SMB2 message (sign_algo_id=1)
signed SMB2 message (sign_algo_id=1)
Seeing the following in VPC flow logs
2 730335344807 eni-05ad72cf6a35649b0 10.26.128.43 10.172.69.24 33266 139 6 3 180 1731627738 1731627767 ACCEPT OK
So maybe its the routing back from ATOS?
SMB traffic is being dropped at the Palo Altos 💀
@bagg3rs has raised a demand with Tech Services
Describe the feature request.
Describe the context.
We embarked on this originally earlier in the year, where the request came in for a datasync instance that would allow OPG to move various pieces of unstructured/semi-structured data (PDFs, Documents etc.) into the Analytical Platform, so that they could be accessed directly from the AP without having to download files from a fileshare and manually reupload them. This would allow the data to be automatically replicated to our account from the fileshare, meaning analysts would be able to natively access all their files. This was for a good while pending the creation of a service account from ATOS, but said account has been created.
Work required:
We need to create an AWS Datasync Instance, and set it up to connect to/authenticate with the fileshare, using the service account provided by ATOS
Definition of done