Open JulsGranados opened 3 years ago
Can we add ability to complete a full sync (including delete) as an urgent feature request if its not possible.
Thanks
+1 - Also require this feature. Thanks
+1
+1
I'm currently looking for alternatives such as clean the directory and then re-deploy. This won't be ideal. However, using PowerShell to manually compare and delete might be too complex.
+1
I'm currently looking for alternatives such as clean the directory and then re-deploy. This won't be ideal. However, using PowerShell to manually compare and delete might be too complex.
You can add following bash script to clean the workspace as a task before deploying the notebook as a quick fix. But I agree that this should be an option in the "Deploy Notebook" task itself.
workspaces=$(databricks workspace ls --absolute --profile AZDO </workspace/folder>)
echo $workspaces | grep -w -q </notebook/folder>; exists=$?
if [ $exists = 0 ]
then
databricks workspace delete --recursive --profile AZDO </notebook/folder>
else
echo "Workspace does not yet exist and thus cannot be deleted."
fi
You can add following bash script to clean the workspace as a task before deploying the notebook as a quick fix. But I agree that this should be an option in the "Deploy Notebook" task itself.
`workspaces=$(databricks workspace ls --absolute --profile AZDO </workspace/folder>)
echo $workspaces | grep -w -q </notebook/folder>; exists=$?
if [ $exists = 0 ] then databricks workspace delete --recursive --profile AZDO </notebook/folder> else echo "Workspace does not yet exist and thus cannot be deleted." fi`
Great thanks @sabacherli for the scripts. However, I've just realised that there is a small chance that the scheduled jobs with these notebooks may fail if the reference notebooks are deleted but not redeployed yet. My observation in our case there will be about 30 seconds gap between the actions. In other words, if the scheduled job kicked off during this 30 seconds, it will fail because the notebook doesn't exist.
I may still have to compare the files one by one and delete the specific files... Hope this feature can be added in the future.
Eventually, write my own bash scripts to
Just post my script here which may help someone.
- script: |
SRC=$(Build.SourcesDirectory)/deployment/notebooks
TGT=.${{ parameters.databricksWorkspaceFolder }} # a local copy of the notebooks for comparing purposes
ADB=${{ parameters.databricksWorkspaceFolder }}
IFS=$'\n'
if [ -d $TGT ];
then
rm -r $TGT;
fi
databricks workspace export_dir $ADB $TGT --profile AZDO
echo "Deleting the files not in source code... (no print means no delete)"
for FILE in `diff -rq $TGT $SRC | grep -E "^Only in $TGT" | sed "s|Only in $TGT\(.*\): \(.*\)|\1/\2|"`
do
if [[ $FILE == *.py ]]
then
echo "Deleting notebook from workspace -> \"$ADB${FILE%.py}\""
databricks workspace delete "$ADB${FILE%.py}" --profile AZDO
else
echo "Deleting folder from workspace -> \"$ADB$FILE\""
databricks workspace delete -r "$ADB$FILE" --profile AZDO
fi
echo "Also delete from local buffer -> \"$TGT$FILE\""
rm -r "$TGT$FILE"
echo
done
echo "Deleting empty dirctory if there are any..."
while [[ `find $TGT -type d -empty -print` ]];
do
find $TGT -type d -empty -print | sed "s|$TGT/\(.*\)|Deleting -> \"$ADB/\1\"|";
{
find $TGT -type d -empty -print | sed "s|$TGT/\(.*\)|databricks workspace delete \"$ADB/\1\" --profile AZDO|";
find $TGT -type d -empty -print | sed "s|$TGT/\(.*\)|rmdir \"$TGT/\1\"|";
} | bash;
done
echo "Done"
echo
echo "Remove the directory for temp buffer..."
if [ -d $TGT ];
then
rm -r $TGT;
echo "$TGT has been Removed"
fi
displayName: 'Delete notebooks and folders not in source'
A fast and easy way to clear folder is to remove all recursively using the databricks workspace command with a bash script after the execution of Databricks CLI configuration task:
steps:
- bash: 'databricks workspace rm -r --profile AZDO "/Target/Path"'
displayName: 'Clean Destination'
Please add the functionality described above. I would like the deployment to have the option to delete resources not existing in the build.
the deloyment notebook task , only adds artifacts it doesnt delete the files on the destination workspace when is deleted from the repo with a commit